Basic Reports: Days between store visits

In this lesson, we will create a report showing the average days between visits based on the artificial data used in "Basic Commands".

The Output

The average days between visits to the store

Tutorial

Customer Average Intervist Days
A00001 117
A00002 86.75
A00003 36.5
A00004 48.8
A00005 5
A00006 104
A00007 64.8
A00008 113
A00009 57.2
A00010 65.25
A00012 33.888889
A00014 39.2
A00015 30
A00016 85.5
A00017 21.583333
A00018 46.25
A00019 119
A00020 44.6
:
:

The process flow

Intervisit days is the number of days a customer visits to the store since the last visit. It is calculated by taking the average of intervisit days for all visits to the store. In order to calculate the difference between days, we will use a command to slide the current date over into a new column.

Below is the process flow from processing the raw data to generate the report, think through each step before proceeding to create the script.

Answer: Process flow

  1. Remove records with null customer ID
  2. Select the required attributes "customer" and "date"
  3. Remove duplicate records
  4. Slide the date field starting from the second row to a new column and name it as "next visit"
  5. Days between "Date" and "Next Visit" is calculated as intervisit days
  6. Calculate the average intervisit days for each customer
  7. Pick out the columns "Customer" and "Average intervist days" as the output

Next, let's discuss how the sliding of data works in step 4.

Slide all items up one cell (xtslide)

xtslide shifts cells on the field defined in -f and slides them to a new column based on the key field defined at -k. For example, the figure on the left belows shows a data set with field customer and date, after we have executed the command "xtslide-k date -a nextvisit", a new column "nextvisit" will be appended as in the middle figure. The second record for customer A and third record for customer B on the nextvisit field are null because the last purchase date for the customer has been shifted to the cell above. Records with null values are removed as shown in the figure on the right, you may specify the parameter -n if you wanna keep the null values in the record.

Script

After we have prepared the work flow, let's see how the actual script is written below.

#/bin/bash
#===============================================================
# MUSASHI bash script
#===============================================================

#---- Title
title="Average intervisit days for each customer"

#---- Comment
comment="Basic Reports"

#---- Variables
inPath="/home/public/tutorial"

#---------------------------------------------------------------
# Command
#---------------------------------------------------------------
xtdelnul -f customer -i $inPath/dat.xt |
xtcut -f customer,date |
xtuniq -k customer,date |
xtsort -k customer,date |
xtslide -k customer -f date:nexvisit |
xtcal -c 'day($nextvisit,$date)' -a intervistdays |
xtagg -k customer -f intervisitdays:avgintervisitdays -c avg |
xtcut -f customer,avgintervisitdays |
xtheader -l "$title" -c "$comment" -o inter.xt
#===============================================================

Check your results to make sure the commands are properly executed.

Exercises

The following are additional exercises, try them out and check your results with the scripts and output given below.

Report Name Script Name Result (xt) Result (html)
The largest intervisit days for each customer inter1.sh inter1.xt inter1.html
The smallest intervisit day(s) for each customer inter2.sh inter2.xt inter2.html