In this lesson, we will create a report showing the average days between visits based on the artificial data used in "Basic Commands".
The average days between visits to the storeTutorial
: |
Intervisit days is the number of days a customer visits to the store since the last visit. It is calculated by taking the average of intervisit days for all visits to the store. In order to calculate the difference between days, we will use a command to slide the current date over into a new column.
Below is the process flow from processing the raw data to generate the report, think through each step before proceeding to create the script.
Answer: Process flow
Next, let's discuss how the sliding of data works in step 4.
xtslide shifts cells on the field defined in -f and slides them to a new column based on the key field defined at -k. For example, the figure on the left belows shows a data set with field customer and date, after we have executed the command "xtslide-k date -a nextvisit", a new column "nextvisit" will be appended as in the middle figure. The second record for customer A and third record for customer B on the nextvisit field are null because the last purchase date for the customer has been shifted to the cell above. Records with null values are removed as shown in the figure on the right, you may specify the parameter -n if you wanna keep the null values in the record.
After we have prepared the work flow, let's see how the actual script is written below.
#/bin/bash #=============================================================== # MUSASHI bash script #=============================================================== #---- Title title="Average intervisit days for each customer" #---- Comment comment="Basic Reports" #---- Variables inPath="/home/public/tutorial" #--------------------------------------------------------------- # Command #--------------------------------------------------------------- xtdelnul -f customer -i $inPath/dat.xt | xtcut -f customer,date | xtuniq -k customer,date | xtsort -k customer,date | xtslide -k customer -f date:nexvisit | xtcal -c 'day($nextvisit,$date)' -a intervistdays | xtagg -k customer -f intervisitdays:avgintervisitdays -c avg | xtcut -f customer,avgintervisitdays | xtheader -l "$title" -c "$comment" -o inter.xt #=============================================================== |
Check your results to make sure the commands are properly executed.
The following are additional exercises, try them out and check your results with the scripts and output given below.
Report Name | Script Name | Result (xt) | Result (html) |
The largest intervisit days for each customer | inter1.sh | inter1.xt | inter1.html |
The smallest intervisit day(s) for each customer | inter2.sh | inter2.xt | inter2.html |