In this lesson, we will create a report on the number of vists to the store for each customer as shown below using the same data set from "Basic Commands"
Number of store visits for each customerTutorial
: |
We will be introducing the use of a new command "xtdelnul" to remove records with null values. Below is the process flow to command the necessary commands and generate the report.
Solution: The process
Removing transaction records with null customer number deletes purchasing history of non members. In this data set, customer number is present in all transactions and therefore no records are removed.
What's the exact definition of number of visits to the store? There are three methods to determine store visits.
First is accounting for the receipt number in transaction for each customer. However, there is a common scenario where customers has left out one or two items on their shopping list before they checkout, and they will go back into the store and make a second purchase on the same day. These cases will be considered as 2 visits since the unit is based on each unique receipt number.
The second method solves the problem with multiple visits on the same day. All visits to the store, whether it's a purchase made in the morning and a purchase made at night will be regarded as one visit to the store.
The third method is the combination of the two, which counts visits took place within a defined time span as one visit (for example within 5 mintues). On the other hand, this method complicates the processing. In this lesson, we will stick with method 2, where a unique "date + receipt" will be counted as one visit.
Let's go over how step 1 in the process flow is carried out.
Remove null values in a record - xtdelnul
The usage of the command is : "xtdelnul -f customer"
The following script finds out the number of vists to the store for each customer, let's see how the actual script is written below.
#/bin/bash #=============================================================== # MUSASHI bash script #=============================================================== #---- Title title="Number of visits" #---- Comment comment="Tutorial" #---- Variables inPath="/mnt/h00/tutorial" #--------------------------------------------------------------- # Commands #--------------------------------------------------------------- xtdelnul -f customer -i $inPath/dat.xt | xtcut -f customer,date | xtuniq -k customer,date | xtcount -k customer -a NumberofVisits | xtcut -f customer,numberofVisits | xtheader -l "$title" -c "$comment" -o visit.xt #=============================================================== |
Check your results to make sure the commands are properly executed.
Let's create another script where the method of accounting for the number of visits is based on each unique receipt number.
The process flow of the script is as follows:
Below is the script to execute the report, check your results and make sure the commands are probably executed.
#/bin/bash #=============================================================== # MUSASHI bash script #=============================================================== #---- Title title="Number of visits" #---- Comment comment="Tutorial" #---- Variables inPath="/mnt/h00/tutorial" #--------------------------------------------------------------- # Commands #--------------------------------------------------------------- xtdelnul -f customer -i $inPath/dat.xt | xtcut -f customer,date | xtuniq -k customer,date | xtcount -k customer -a storevisits | xtcut -f customer,storevisits >xxa xtdelnul -f customer -i $inPath/dat.xt | xtcut -f customer,date,receipt | xtuniq -k customer,date,receipt | xtcount -k customer -a numberofvisits | xtjoin -k customer -m xxa -f storevisits | xtcut -f customer,numberofvisits,storevisits | xtheader -l "$title" -c "$comment" -o visit.xt rm -f xx* #=============================================================== |
All temporary files used in the script and store in the current directory. When processing is done, the files can be removed by adding "rm -f xx*" to the last line of the script (-f means force delete, never prompt). Large temporary files left in the directory takes up a lot of disk space esp. when there are multiple temporary files. Therefore, a fix format for the workfiles (e.g. xx in front of file name) can ease remove process by specifying xx*. |
The following are additional exercises on finding out number of visits. Check your results with the scripts and ouput given below.
Report Name | Script Name | Result (xt) | Result (html) |
Top 20 customers with the most number of visits | visit1.sh | visit1.xt | visit1.html |
Top 5 customer with the most number of visits in each month | visit2.sh | visit2.xt | visit2.html |
Top 20 customers with the most number of visits who had purchased isotonic drink (140323) | visit3.sh | visit3.xt | visit3.html |