xtagg

Section: User Commands (1)
Updated: 2002-10-26
Index Return to Main Contents

 

NAME

xtagg - aggregation

 

SYNOPSIS

xtagg -f attribute -c the way of aggregation{sum|avg|max|min|cnt} [-k key attribute(s)] [-q] [-i INPUT] [-o OUTPUT] [-z] [-t] [-T TEMP DIRECTORY]

 

DESCRIPTION

This command aggregates the numerical values on an attribute to derive the sum, average, count, maximum and minimum. It aggregates the records (lines) for the specific field defined by the user in the field argument -f, where it will be the basis for summation. User has to specify the key attribute with -k option as the basis of the calculated values. If -k is not specified, the calculation will be based upon all records in the data. In addition, the attributes defined in -k and -f must be different. Sorting by key attributes is not required prior to xtagg command, the output file will be sorted by the attribute specified by -k.

 

PARAMETERS

-k key attribute(s)
key attribute where aggregation is based on. (if omitted, all records will be based on the same key value)
-f attribute(s)
attribute(s) for which aggregation is performed. The resulting value can be stored as a new attribute name specified by:
-f attributeName:calculatedAttributeName
-c computation
Specify what type of aggregation to be performed. Specification can be given in the following:
sum summation
cnt count
avg average
max maximum value among the records in the key attribute
min minium value among the records in the key attribute
-q sequential processing
When this option is used with the -k parameter, the command processes the input data in original sequence of the records, instead of sorting by the key attribute -k.

 

FILE OPTIONS

-i input filename
if a suffix of the filename is '.gz', the command acts as a filter, extracting the compressed file for processing. The command will read the file as standard input when "-i" is not specified.
-o output filename
if a suffix of the filename is '.gz', the command automatically returns the output data in zip archive. When "-o" is not specified, the result will sent to standard output.
-T temp file directory
the directory name for temporal files used in this command.
-z zip archive
compress the standard output to zip archive. When the option "-o" is not given and "-z" is specified, the output will be compressed as zip archive.
-t plain text
xtagg treats the input and output data as plain text format.

 

USAGE

Input file - dat.xt:
<field no="1">
<name>CustomerID</name>
<sort priority="1">
</sort>
</field>
<field no="2">
<name>Date</name>
<sort priority="2">
</sort>
</field>
<field no="3">
<name>Amount</name>
</field>
<field no="4">
<name>Quantity</name>
</field>
</header>
<body><![CDATA[
A00005 20020918 1504 4
A00005 20020918 1875 5
A00005 20020918 365 1
A00005 20020918 810 2
A00005 20020923 491 1
A00033 20020618 1389 3
A00033 20020618 1656 4
A00033 20020618 183 1
A00033 20020618 305 1
A00033 20020618 501 1
A00033 20020618 576 1
A00052 20020216 249 1
A00052 20020216 357 1
A00052 20020216 446 1
A00053 20020106 233 1
A00053 20020106 586 1
A00053 20020208 429 1
A00053 20020427 2004 4
A00053 20020427 362 1
A00053 20020427 435 1

Example 1. Calculate the total amount and quantity each customer purchased.
e.g. xtagg -k CustomerID -f Quantity,Amount -c sum -i dat.xt -o rsl.xt Output file - rsl.xt


<field no="1">
<name>CustomerID</name>
<sort priority="1">
</sort>
</field>
<field no="2">
<name>Date</name>
</field>
<field no="3">
<name>Amount</name>
</field>
<field no="4">
<name>Quantity</name>
</field>
</header>
<body><![CDATA[
A00005 20020923 5045 13
A00033 20020618 4610 11
A00052 20020216 1052 3
A00053 20020427 4049 9
A00056 20021128 2362 5

Example 2. Calculate the total quantity and total amount for each customer and rename the attributes.
e.g. xtagg -k customerID -f quantity:total quantity,amount:total amount -c sum -i dat.xt -o rsl.xt Output file - rsl.xt


<field no="1">
<name>CustomerID</name>
<sort priority="1">
</sort>
</field>
<field no="2">
<name>Date</name>
</field>
<field no="3">
<name>TotalAmount</name>
</field>
<field no="4">
<name>TotalQuantity</name>
</field>
</header>
<body><![CDATA[
A00005 20020918 5045 13
A00033 20020618 4610 11
A00052 20020216 1052 3
A00053 20020106 4049 9
A00056 20021128 2362 5

 

DIAGNOSTICS

If there happens to be a null value in any of the values, the cumulative value will be null. To avoid this, delete the line with null value with "xtdelnul" command, or replace the null value to 0 by "xtnulto" command. After aggregration, other attributes besides the ones defined in -k and -f may be meaningless. If there is a null value in any of the values for the attribute in the -f argument, the cumulative value will be null. To avoid this, delete the line with null value with "xtdelnul" command, or replace the null value to 0 by "xtnulto" command. If aggregration is done as "-c cnt", the null value will not be counted.

 

SEE ALSO

xtcount(1), xtstatistics(1) For complete documentation and tutorial of xtagg and other commands, please visit http://musashien.sourceforge.net.

 

BUG REPORT

If you find a bug in xtagg, please send an electronic mail to musashi@adm.osaka-sandai.ac.jp. Before sending a bug report, please verify that you have the lastest version of MUSASHI. Read this manual carefully to ensure the error is not caused by a quirk in the language.

 

AUTHORS

Yukinobu Hamuro, Naoki Katoh, Katsutoshi Yada, Stephane Cheung


 

Index

NAME
SYNOPSIS
DESCRIPTION
PARAMETERS
FILE OPTIONS
USAGE
DIAGNOSTICS
SEE ALSO
BUG REPORT
AUTHORS

This document was created by man2html, using the manual pages.
Time: 22:43:52 GMT, June 24, 2003