Basic Reports: Basket Purchase

In this lesson, we will create a report on the 20 most popular 4-digit categories brought with seasoning based on the same artificial data used in "Basic Commands".

The output

Top 20 categories by quantity purchased along with seasonings

Tutorial

Rank 4-digit Category Category Name/TH> Quantity
1 1403 Soft drink 2490
2 1301 Confectionery 2133
3 1406 Alcoholic beverage 1957
4 1401 Seasoning 1247
5 1105 Frozen food 1223
6 1107 Starch 1183
7 1110 Dairy products 1147
8 1104 Boiled fish paste 1138
9 1116 Processed fish 1062
10 1121 Canned food 1048
11 1108 Noodles 1044
12 1112 Oatmeals 996
13 1111 Cooking oil 869
14 1102 Soup 817
15 1106 Pickles and soy dishes 722
16 1117 Salted food 696
17 1118 Ice Cream 686
18 1304 Alcoholic products 624
19 1407 Processed meats 615
20 1115 ‰ÁH“÷—Þ 595

Creating the report

This report is very similar to the report in the previous lesson "Top 20 most popular brand purchased". The only difference is that we will be examining products purchased together.

Here's the four key considerations in market basket analysis:

  1. The definition of "simultaneous purchase"
  2. The definition of target basket items
  3. Determining items purchased simultaneously with target basket items
  4. Determining the unit of items purchased simultaneously for each case

"Simultaneous purchase" is defined as purchases done together and the items are listed on one receipt. Depending on the purpose, items that the customer purchase within the same week can also be considered as simultaneous purchase. In this lesson, let's restrict the unit of the market basket as the items purchased on the same receipt.

Next, let's identify the target basket items to be included in the analysis. The process of exploring the most popular basket items in purchases without any specific target items, this analysis technique is known as market basket, it comes from the idea that a customer places all items into a shopping cart. There are certain properties to determine and sort out target basket items, which includes colors, or package unit. Other properties are brand, manufacturer, are commonly used in conjunction. In this lesson, let's pick "seasoning" (1101) to be our target basket item.

The items purchased together with the target basket item on the same unit have a lot of interesting applications. For example, it can be used to find out the most popular items purchased with the target items in the same basket to find out correlation different products or the brand image. In this lesson, we will focus on the 4-digit categories purchased along with the target item.

Finally, when selecting the unit of simultaneous purchases, there are two options to choose from, "quantity" and "number of occurance". Number of occurance counts the items purchased together on one receipt as a case. The quantity of item purchased will be counted as is shown on the receipt. In this lession, let's use the number of occurance as the unit of accounting.

The process flow

Most commands used in this lesson is covered in Basic Techniques, and new command "xtselstr" is used to select the transactions containing the target basket item based on the same key. Below is the process flow to creating the report.

Answer: The process

  1. Select receipts containing purchases in seasoning category(category code - 1101)
  2. Remove transcations containing purchases in seasoning category
  3. Remove duplicate transactions on the same item purchased
  4. Select the necessary "4-digit category"
  5. Count the number of cases for each 4-digit category.
  6. Add the category name, and ranking, then select the 20 best categories in the market basket sold.

Script

After we have thought out and prepare for the report by going through the above process, let's see how the actual script is written below.
#/bin/bash
#===============================================================
# MUSASHI bash script
#===============================================================

#---- Title
title="The top 20 categories purchased together with the target item seasoning"

#---- Comment 
comment="Basic Commands"

#---- Variables 
inPath="/home/public/tutorial"

#---------------------------------------------------------------
# Commands 
#---------------------------------------------------------------
xtselstr -k Date,ReceiptNumber -f CategoryCode4 -v 1101 -i $inPath/dat.xt |
xtselstr -f CategoryCode4 -v 1101 -r |
xtcut    -f CategoryCode4,Quantity   |
xtagg    -k CategoryCode4 -f Quantity -c sum |
xtjoin   -k CategoryCode4 -m $inPath/gicfs4.xt -f CategoryCode4Name |
xtbest   -R MIN_20 -s Quantity%n%r                        |
xtnumber -a Rank                                          |
xtcut    -f Rank,CategoryCode4,CategoryCode4Name,Quantity |
xtheader -l "$title" -c "$comment" -o basket.xt
#===============================================================

Check your results to make sure the commands are properly executed.

Exercises

The following are more sample reports pertained to the market basket analysis. Check your resutls with the scripts and output given below.

Report name Script name Result (xt) Result (html)
Repeat the example above and find out the 6-digit basket categories purchased, use number of occurance as a unit basket1.sh basket1.xt basket1.html
Repeat the example above, and find out the 4-digit basket categories purchased under processed food category (category code 11) basket2.sh basket2.xt basket2.html
Find out the best 20 categories purchased with isotonic drink (140323) based on the number of occurance basket3.sh basket3.xt basket3.html