Basic Commands
Lesson 13: Join (xtjoin)

At times, you encounter situations data are stored as different sources and you will need to join two or more data sets to retrieve the information you need. This lesson will cover MUSASHI's capability to merge these files with xtjoin command.

Summary of options and usage: xtjoin


Using xtjoin

Goal: Join the file with data on total amount and quantity for each 4-digit classificaiton code with the classification code master file gicfs4.xt.

Methodology: Select "CategoryCode4","Quantity" and "Amount" by xtcut, calcuate the total quantity and amount for each date by xtagg. Then, add the category code description to the file you have created by joining the 4 digit classification code master file. Pipe the output to xtheader, define a new title and comment as need, update the -o parameter and write the result to the file "xtjoin.xt".

Repulciate the script "xtcut.sh" used in the previous lesson and name the new script as "xtjoin.sh"

copy the file xtcut.sh as a new name on the current directory
new name : xtjoin.sh
cp /home/public/lesson/basic/xtcut.sh /home/public/lesson/basic/xtjoin.sh

Specify the parameters as follows:

Key - -k CategoryCode4
Note: 4-digit classificaiton code is the common attribute in both files, and it will therefore be the key for joining the two files. Field - -f CategoryCode4
Note: This parameter defines the field(s) in the master file to be joined to the dataset. The argument for -f should be the attribute to be joined to the dataset. The format of data in the master file is as follows:
CategoryCode4 Code4Desc 1101 seasoning
1102 cooking oil
Master - -m gicfs4.xt
Note: The parameter defines the name of the master file.

Your script will look like:
#/bin/bash
#===============================================================
# MUSASHI bash script
#===============================================================

#---- title
title="Tutorial"

#---- comment
comment="xtuniq"

#---- variables
inPath="/home/public/tutorial"

#---------------------------------------------------------------
  # command
#---------------------------------------------------------------
xtcut -f CategoryCode4,Quantity,Amount -i $inPath/dat.xt |
xtagg -k CategoryCode4 -f Quantity:TotalQuantity,Amount:TotalAmount -c sum |
xtjoin -k CategoryCode4 -m $inPath/gicfs4.xt -f Code4Desc |
xtheader -l "$title" -c "$comment" -o xtjoin.xt #===============================================================

When you are done, save and execute the script. The result should look as follows:

<?xml version="1.0" encoding="euc-jp"?>
<xmltbl version="1.00">
<header>
<title>
Tutorial
</title>
<comment>
xtagg
</comment>
<field no="1">
<name>CategoryCode2</name>
</field>
<field no="2">
<name>Quantity</name>
</field>
<field no="3">
<name>Amount</name>
</field>
<field no="4">
<name>Code4Desc</name>
</field>
</header>
<body><![CDATA[
1101 6247 2388930 Seasonings
1102 1515 546193 Cooking Oil
1103 365 145373 Spreadings
1104 2173 849791 Daily Products
1105 2249 860320 Cooking Condiments

One Point: Compressed file
MUSASHI can handle compressed file. When the extension of a file in the "-i" parameter is ".gz", the file will be treated as a compressed format. This also applies to master files to be joined in the xtjoin command defined in the "-m" argument. You may also writeout the file as a compressed format when by specifying ".gz" as the file extension at parameter "-o". When the standard input is passed, by specifying the parameter "-z", the output file defined at parameeter "-o" will be in compressed format .


Exercises

Let's apply xtuniq on the following reports. Check your results with the scripts and output files given below.

Report Description
Script name Output file (xt) Output file (html)
Total quantity and amount for 2-digit cateogry code with 2-digit category code name xtjoin1.sh xtjoin1.xt xtjoin1.html
Total of quantity and amount for 6-digit category code joined with 6-digit category code description xtjoin2.sh xtjoin2.xt xtjoin2.html