Basic Commands
Lesson 11: Concatenating Records (xtcat)

When processing data, we may often times encounter situations where a huge data set is stored as multiple data sets classified by time, category, or other attributes. This command empowers the user to merge multiple files for different processing needs.

Summary of options and usage: xtcat


Using xtcat

Goal: Merge all data sets for the year 2002.

Methodology: Repulciate the script "xtagg.sh" and name the new script as "xtcat.sh". Use xtcat command to merge the 12 dataset for year 2002 that is located in the same directory /home/public/tutorial, then extract "date", "quantity" and "amount" by xtcut. Use xtagg to ompute the total quantity and amount based on each unique "date". Define a new title and comment with xtheader, and update the -o parameter to write the result to the file "xtcat.xt".

Specify the parameters as follows:

Input - -i $inPath/'dat2002*.gz'
Note: "-i" parameter defines the input files we can to merge. We may list all files delimited by comma, but it may become cumbersome to list all files out. In lieu, we can use the wildcard character to simply the list. As this can only be intepreted by xtcat command, we should put single quotes around the file name.
#/bin/bash
#===============================================================
# MUSASHI bash script
#===============================================================

#---- title
title="Tutorial"

#---- comment
comment="xtcut"

#---- variables
inPath="/home/public/tutorial"

#---------------------------------------------------------------
  # command
#---------------------------------------------------------------
xtcat -i $inPath/'dat2002*.xt.gz' |
xtcut -f Date,Quantity,Amount |
xtagg -k Date -f Quantity:TotalQuantity,Amount:TotalAmount -c sum |
xtheader -l "$title" -c "$comment" -o xtcat.xt #===============================================================

Go to lesson xtagg and check if the results are the same.

One Point: Directory index
By making use of xtcat command, directory can be served as an index. For example, in order to select sales data of a particular retail store on a particular month (say, the data at store A on Jan. 2002), if you keep all data by separating them by store and by year/month,  the selection based on the directory (the data at store A on Jan. 2002 is at ./A/200201) can be made much faster than when all data are kept in one file and the selection is done by using xtsel.
The data selection is done by xtcat using a wild card. For instance, if you specify "./[ABCD]/20021?/dat.xt.gz", you can merge the data of the months of October, November and December, 2002 at store A, B< C, or D.


Exercises

Let's apply xtcat on the following reports. Check your results with the scripts and output files given below.

Report Description
Script name Output file (xt) Output file (html)
Total quantity and amount for each day from April to June xtcat1.sh xtcat1.xt xtcat1.html
Total quantity and amount for each day on January, March and May xtcat2.sh xtcat2.xt xtcat2.html
Total quantity and amount for each day on April to June and from October and December xtcat3.sh xtcat3.xt xtcat3.html