Case Studies: Customer Attrition Analysis
Scenario: In this project, you will learn how to apply customer attrition analysis in a business scenario using MUSASHI. Customer attrition is often times a problem in most business, and the cost of acquiring a new customer is often higher than keeping an existing customer.
As a result, the knowledge of customer behavior and the reason for loss of customers provides crucial insights for companies to strategized new campaign to retain exisiting customers.
Your challenge: The management has recently noticed a decrease of customers visiting their department store. As a Database Marketing Analyst, you would like to make use of the data to find out the reason and explore the purchasing patterns of customers to learn if there are any flags or characteristics that would give them insights of customer attrition.
Methodology: After initial analysis of the data and careful selection of variables, you have determined that profit and idle time between visits are key predictors to their future visits. The initial target problem is to analyse buying patterns, and it will be appropriate to apply pattern analysis techniques on profit for each visit and idle time between visits.
Data Mining Tools: Besides the set of common data processing tools used in the previous tutorials, MUSASHI also have a data mining commmand - xtclassify which reads patterns and generate rules for the data set.
Data Format: Two datasets will be created each of which will have 3 main components: customer, pattern, and target.
Data Size:
No. of customers:
Observation Period: 20000101 - 20021231
Target Period: 20030101 - 20031231
Download sample dataset: dat.xt
Note: The dataset used in this case is an artifical data set, and it does not necessarily reflects real purchasing behavior of customers.
Definintion of Target Variables
The target variable tells whether the customer will continue to visit in the target period. "0" represents the customer will not be visiting again, whereas "1" represents the customer will continue to purchase in the department store for more than six times in the target period and that they have been making purchases for at least a year.
Creating Target Variables
Class "0" customers - Select customers whose date of last purchase falls before 20021201.
Your script should look like this:
#select dead customers
xtcut -f customer,date -i $inPath/dat.xt |
xtuniq -k customer,date |
xtagg -k customer -f date:endDate -c max |
xtsel -c '$endDate<20021201' |
xtcut -f customer |
xtsetchr -v 0 -a target -o xcustomerdead.xt
Class "1" customers - Select customers who visits more than 6 times in the target period and have visited the store for at least 2 years.
xtcut -f customer,date -i $inPath/dat.xt |
xtuniq -k customer,date | tee xmaster.xt
#select active customers who visits more than 6 times in target period
xtsel -c '$date>20021201' |
xtcount -k customer -a numtgVisit |
xtsel -c '$numtgVisit>=6' -o xcustomertemp.xt
#select active customer who have visited the store for at least 2 years
xtsel -c '$date<20021201' -i xmaster.xt |
xtagg -k customer -f date:startDate -c min |
xtjoin -k customer -m xendDate.xt -f endDate |
xtcal -c 'day($endDate,$startDate)' -a visitPeriod |
#select active customers who have visited for at least 2 years and have 6 or more visits in the target period
xtjoin -k customer -m xcustomertemp.xt -f numtgVisit |
xtcut -f customer |
xtsetchr -v 1 -a target -o xcustomeractive.xt
Note: When joining the master data set with customers visiting 6 more visits in the target period, the customers who did not visit more than 6 times and have visited for at least 2 years will be removed.
#concat the the two groups of customers to customer.xt
xtcat -i xcustomerdead.xt,xcustomeractive.xt |
xtheader -l "$title" -c "$comment" -o customer.xt
As a result, your sample dataset should look as follows:
<?xml version="1.0" encoding="euc-jp"?>
<xmltbl version="1.00">
<header>
<title>
Customer Master File
</title>
<comment>
Active and dead customers
</comment>
<name>customer</name>
</field>
<field no="2">
<name>target</name>
</field>
</header>
<body><![CDATA[
0001 0
0002 0
0003 0
0004 0
0005 0
0006 0
0007 0
0008 0
0009 0
0010 0
..
.
0995 1
0996 1
0997 1
0998 1
0999 1
1000 1
1001 1
1002 1
1003 1
1004 1
1005 1
1006 1
1007 1
1008 1
1009 1
1010 1
..
.
Definition of predicting variables
1. Pattern of profit per visit
First, find out the total profit per visit for each customers, then, use xtbucket command which clusters profit into definied groups and concat the clusters into a string of patterns by order. Afterwards, join the target attribute from the customer.xt dataset to this dataset, and partition 15% of the data as test dataset, and the rest as training data.
xtagg -k customer,date -f profit -c sum -i $inPath/dat.xt |
xtbucket -f profit:profitRange -n 4 |
xtpattern -k customer -s date -f profitRange:profitPattern |
xtjoin -k customer -m customer.xt -f target |
xtcut -f customer,profitPattern,target |
xtselrand -k target -p 15 -u profittrain.xt |
xtheader -l "$title" -c "$comment" -o profittest.xt
Now that you have created the trend of profit dataset, let's move onto making a time between visits trend dataset.
2. Pattern of idle time between visits
Find out the days between visits by first using xslide to append the column of next visits to the data, then calculating the difference of days between the first visit and the next visit, afterwards, cluster the idle time between visits and transform the clusters into patterns. Join the target attribute and create a training and test data set as you did previously.
xtcut -f customer,date -i $inPath/dat.xt |
xtuniq -k customer,date |
xtslide -k customer -s date -f date:nextVisit |
xtcal -c 'day($nextVisit,$date)' -a interVisit|
xtbucket -f interVisit:visitRang -n 4 |
xtpattern -k customer -s date -f visitRang:visitPattern |
xtjoin -k customer -m customer.xt -f target |
xtcut -f customer,visitPattern,target |
xtselrand -k target -p 15 -u visittrain.xt |
xtheader -l "$title" -c "$comment" -o visittest.xt
Application of Data Mining Techniques
To analysis string patterns, we can apply xtclassify which uses a technique called alphabet indexing in result generates a decision tree to classify positive and negative examples. Run the command for both datasets as shown below:
xtclassify -p visitPattern%n -c target -i visittrain.xt -I visittest.xt >visitResult.xt
xtclassify -p profitPattern%n -c target -i profittrain.xt -I profittest.xt >profitresult.xt
The Result
|