Basic Commands
Lesson 2: Aggregration (xtagg) Part I

This command sum up a numeric value on an attribute to derive the total. It aggregates the records (lines) for the specific field defined by the user in the field argument, the field will be the basis for summation.

Summary of options and usage: xtagg


Using xtagg

Step 1: Editing your script

Startup FD. Repulciate the script "xtcut.sh" used in the previous lesson by selecting xtcut.sh and press "c+CTRL", you will then be prompted for a new name, name the new script as "xtagg.sh"

copy the file xtcut.sh as a new name on the current directory
new name : xtagg.sh
cp /home/public/lesson/basic/xtcut.sh /home/public/lesson/basic/xtagg.sh

Goal: Calculate the total amount of quantity and dollar amount sold for each unique date.

Methodology: Select "Date", "Quantity" and "Amount" by xtcut, the result is sent as an input to the next command xtagg by "|". The command xtagg in turn computes the total quantity and amount based on each unique "date". The output will pipe to xtheader, define a new title and comment as need, update the -o parameter and write the result to the file "xtagg.xt".

Step 2: Defining Attributes and Options

Key - -k Date
Note: Key is a column or set of columns that uniquely idenifies the rest of the data in any given row. DATE will be the key attribute where the unit of aggregation is based on.

Attributes - -f Quantity,Amount
Note: The argument -f defines which attributes will the summation be performed on. Spaces should be omitted between multiple field arguments

Type - -c sum
Note: "-c" defines the type of aggregration to perform on the attributes. For this example, "sum" will be used. Please refer to xtagg for the complete list of aggregration type.

Your script should look as follows:

#/bin/bash
#===============================================================
# MUSASHI bash script
#===============================================================

#---- title
title="Tutorial"

#---- comment
comment="xtagg"

#---- variables
inPath="/home/public/tutorial"

#---------------------------------------------------------------
  # command
#---------------------------------------------------------------
xtcut -f Date,Quantity,Amount -i $inPath/dat.xt |
xtagg -k Date -f Quantity,Amount -c sum |
xtheader -l "$title" -c "$comment" -o xtagg.xt #===============================================================

Step 3: Running the script

When you are done, save and execute the script. The result should look as follows:

<?xml version="1.0" encoding="euc-jp"?>
<xmltbl version="1.00">
<header>
<title>
Tutorial
</title>
<comment>
xtagg
</comment>
<field no="1">
<name>Date</name>
</field>
<field no="2">
<name>Quantity</name>
</field>
<field no="3">
<name>Amount</name>
</field>
</header>
<body><![CDATA[
20020101 161 60034
20020102 40 13959
20020103 155 62402
20020104 107 41467
20020105 52 21283
20020106 106 43070
20020107 87 31458
20020108 98 40726
20020109 152 61779
20020110 144 51501
20020111 150 62140
20020112 177 69727
20020113 120 46533
20020114 75 30533
20020115 138 54459

Redefining Attributes

Besides defining title and comment, xtheader can also define new attribute name, but it requires attribute's name to be passed as an argument. Yet, it will be more efficient to rename the attributes at "xtagg" where the attribute names are specified.

Let us modify the attribute "quantity" and "amount" to "total quantity" and "total amount". In the argument of -f, define the name of the attribute in the sample data set, followed by ":", then the new attribute name. Keep in mind that there should be nospaces in between the arguments and attribute name. See the script as follows:
#/bin/bash
#===============================================================
# MUSASHI bash script
#===============================================================
#---- Title
title="Tutorial"

#---- Comment
comment="xtagg"

#---- variables
inPath="/home/public/tutorial"
#---------------------------------------------------------------
# commands
#---------------------------------------------------------------
xtcut -f Date,Quantity,Amount -i $inPath/dat.xt |
xtagg -k Date -f Quantity:TotalQuantity,Amount:TotalAmount -c sum |
xtheader -l "$title" -c "$comment" -o xtagg.xt
#===============================================================

Let check the name attribute name in "xtagg.xt":

<?xml version="1.0" encoding="euc-jp"?>
<xmltbl version="1.00">
<header>
<title>
Tutorial
</title>
<comment>
xtcut
</comment>
<field no="1">
<name>Date</name>
</field>
<field no="2">
<name>TotalQuantity</name>
</field>
<field no="3">
<name>TotalAmount</name>
</field>
</header>


Using xtstatistics

The command xtstatistics works the same as xtagg, the difference is in the type of statistics to be calculated - xtstatistics computes the variance and standard deviation. The sample usage is shown below:

xtagg -k Date -f Quantity,Amount -c var -i dat.xt -o varianceStat.xt
xtagg -k Date -f Quantity,Amount -c std -i dat.xt -o stdStat.xt

Exercises

Let's apply xtagg on the following reports. Check your results with the scripts and output files given below.

Report Description
Script name Output file (xt) Output file (html)
Average quantity and amount per date xtagg1.sh xtagg1.xt xtagg1.html
Maximum quantity and amount per date xtagg2.sh xtagg2.xt xtagg2.html
Minimum quantity and amount per date xtagg3.sh xtagg3.xt xtagg3.html
Total quantity and amount
per 2-digit classification code
xtagg4.sh xtagg4.xt xtagg4.html
Average quantity and amount
per 2-digit classification code
xtagg5.sh xtagg5.xt xtagg5.html
Total Quantity and Amount
per 4-digit classification code per manufacturer
xtagg6.sh xtagg6.xt xtagg6.html
Average quantity and amount
per 4-digit classification code per manufacturer
xtagg7.sh xtagg7.xt xtagg7.html
Home  |  Next> Lesson 3: Aggregation II