In this lesson, we will create a report on the 20 most popular 4-digit categories brought with seasoning based on the same artificial data used in "Basic Commands".
Top 20 categories by quantity purchased along with seasoningsTutorial
|
Here's the four key considerations in market basket analysis:
"Simultaneous purchase" is defined as purchases done together and the items are listed on one receipt. Depending on the purpose, items that the customer purchase within the same week can also be considered as simultaneous purchase. In this lesson, let's restrict the unit of the market basket as the items purchased on the same receipt.
Next, let's identify the target basket items to be included in the analysis. The process of exploring the most popular basket items in purchases without any specific target items, this analysis technique is known as market basket, it comes from the idea that a customer places all items into a shopping cart. There are certain properties to determine and sort out target basket items, which includes colors, or package unit. Other properties are brand, manufacturer, are commonly used in conjunction. In this lesson, let's pick "seasoning" (1101) to be our target basket item.
The items purchased together with the target basket item on the same unit have a lot of interesting applications. For example, it can be used to find out the most popular items purchased with the target items in the same basket to find out correlation different products or the brand image. In this lesson, we will focus on the 4-digit categories purchased along with the target item.
Finally, when selecting the unit of simultaneous purchases, there are two options to choose from, "quantity" and "number of occurance". Number of occurance counts the items purchased together on one receipt as a case. The quantity of item purchased will be counted as is shown on the receipt. In this lession, let's use the number of occurance as the unit of accounting.
Most commands used in this lesson is covered in Basic Techniques, and new command "xtselstr" is used to select the transactions containing the target basket item based on the same key. Below is the process flow to creating the report.
Answer: The process
After we have thought out and prepare for the report by going through the above process, let's see how the actual script is written below.
#/bin/bash #=============================================================== # MUSASHI bash script #=============================================================== #---- Title title="The top 20 categories purchased together with the target item seasoning" #---- Comment comment="Basic Commands" #---- Variables inPath="/home/public/tutorial" #--------------------------------------------------------------- # Commands #--------------------------------------------------------------- xtselstr -k Date,ReceiptNumber -f CategoryCode4 -v 1101 -i $inPath/dat.xt | xtselstr -f CategoryCode4 -v 1101 -r | xtcut -f CategoryCode4,Quantity | xtagg -k CategoryCode4 -f Quantity -c sum | xtjoin -k CategoryCode4 -m $inPath/gicfs4.xt -f CategoryCode4Name | xtbest -R MIN_20 -s Quantity%n%r | xtnumber -a Rank | xtcut -f Rank,CategoryCode4,CategoryCode4Name,Quantity | xtheader -l "$title" -c "$comment" -o basket.xt #=============================================================== |
Check your results to make sure the commands are properly executed.
The following are more sample reports pertained to the market basket analysis. Check your resutls with the scripts and output given below.
Report name | Script name | Result (xt) | Result (html) |
Repeat the example above and find out the 6-digit basket categories purchased, use number of occurance as a unit | basket1.sh | basket1.xt | basket1.html |
Repeat the example above, and find out the 4-digit basket categories purchased under processed food category (category code 11) | basket2.sh | basket2.xt | basket2.html |
Find out the best 20 categories purchased with isotonic drink (140323) based on the number of occurance | basket3.sh | basket3.xt | basket3.html |