xtagg
Section: User Commands (1)
Updated: 2002-10-26
Index
Return to Main Contents
NAME
xtagg - aggregation
SYNOPSIS
xtagg -f attribute -c the way of aggregation{sum|avg|max|min|cnt}
[-k key attribute(s)] [-q] [-i INPUT] [-o OUTPUT] [-z] [-t] [-T TEMP DIRECTORY]
DESCRIPTION
This command aggregates the numerical values on an attribute to derive the sum, average, count, maximum and minimum. It aggregates the records (lines) for the specific field defined by the user in the field argument -f, where it will be the basis for summation.
User has to specify the key attribute with -k option as the basis of the calculated values. If -k is not specified, the calculation will be based upon all records in the data. In addition, the attributes defined in -k and -f must be different.
Sorting by key attributes is not required prior to xtagg command,
the output file will be sorted by the attribute specified by -k.
PARAMETERS
- -k key attribute(s)
-
key attribute where aggregation is based on. (if omitted, all records will be based on the same key value)
- -f attribute(s)
-
attribute(s) for which aggregation is performed. The resulting value can be stored as a new attribute name specified by:
-
-f attributeName:calculatedAttributeName
- -c computation
-
Specify what type of aggregation to be performed. Specification can be given in the following:
-
sum
summation
-
cnt
count
-
avg
average
-
max
maximum value among the records in the key attribute
-
min
minium value among the records in the key attribute
- -q sequential processing
-
When this option is used with the -k parameter, the command processes the input data in original sequence of the records, instead of sorting by the key attribute -k.
FILE OPTIONS
- -i input filename
-
if a suffix of the filename is '.gz', the command acts as a filter, extracting the compressed file for processing. The command will read the file as standard input when "-i" is not specified.
- -o output filename
-
if a suffix of the filename is '.gz', the command automatically returns the output data in zip archive. When "-o" is not specified, the result will sent to standard output.
- -T temp file directory
-
the directory name for temporal files used in this command.
- -z zip archive
-
compress the standard output to zip archive. When the option "-o" is not given and "-z" is specified, the output will be compressed as zip archive.
- -t plain text
-
xtagg treats the input and output data as plain text format.
USAGE
Input file - dat.xt:
<field no="1">
<name>CustomerID</name>
<sort priority="1">
</sort>
</field>
<field no="2">
<name>Date</name>
<sort priority="2">
</sort>
</field>
<field no="3">
<name>Amount</name>
</field>
<field no="4">
<name>Quantity</name>
</field>
</header>
<body><![CDATA[
A00005 20020918 1504 4
A00005 20020918 1875 5
A00005 20020918 365 1
A00005 20020918 810 2
A00005 20020923 491 1
A00033 20020618 1389 3
A00033 20020618 1656 4
A00033 20020618 183 1
A00033 20020618 305 1
A00033 20020618 501 1
A00033 20020618 576 1
A00052 20020216 249 1
A00052 20020216 357 1
A00052 20020216 446 1
A00053 20020106 233 1
A00053 20020106 586 1
A00053 20020208 429 1
A00053 20020427 2004 4
A00053 20020427 362 1
A00053 20020427 435 1
Example 1. Calculate the total amount and quantity each customer purchased.
e.g. xtagg -k CustomerID -f Quantity,Amount -c sum -i dat.xt -o rsl.xt
Output file - rsl.xt
-
<field no="1">
<name>CustomerID</name>
<sort priority="1">
</sort>
</field>
<field no="2">
<name>Date</name>
</field>
<field no="3">
<name>Amount</name>
</field>
<field no="4">
<name>Quantity</name>
</field>
</header>
<body><![CDATA[
A00005 20020923 5045 13
A00033 20020618 4610 11
A00052 20020216 1052 3
A00053 20020427 4049 9
A00056 20021128 2362 5
Example 2. Calculate the total quantity and total amount for each customer and rename the attributes.
e.g. xtagg -k customerID -f quantity:total quantity,amount:total amount -c sum -i dat.xt -o rsl.xt
Output file - rsl.xt
-
<field no="1">
<name>CustomerID</name>
<sort priority="1">
</sort>
</field>
<field no="2">
<name>Date</name>
</field>
<field no="3">
<name>TotalAmount</name>
</field>
<field no="4">
<name>TotalQuantity</name>
</field>
</header>
<body><![CDATA[
A00005 20020918 5045 13
A00033 20020618 4610 11
A00052 20020216 1052 3
A00053 20020106 4049 9
A00056 20021128 2362 5
DIAGNOSTICS
If there happens to be a null value in any of the values, the cumulative value will be null.
To avoid this, delete the line with null value with "xtdelnul" command, or replace the null value to 0 by "xtnulto" command.
After aggregration, other attributes besides the ones defined in -k and -f may be meaningless.
If there is a null value in any of the values for the attribute in the -f argument, the cumulative value will be null.
To avoid this, delete the line with null value with "xtdelnul" command, or replace the null value to 0 by "xtnulto" command. If aggregration is done as "-c cnt", the null value will not be counted.
SEE ALSO
xtcount(1),
xtstatistics(1)
For complete documentation and tutorial of xtagg and other commands, please visit
http://musashien.sourceforge.net.
BUG REPORT
If you find a bug in xtagg, please send an electronic mail to
musashi@adm.osaka-sandai.ac.jp.
Before sending a bug report, please verify that you have the lastest version of
MUSASHI.
Read this manual carefully to ensure the error is not caused by a quirk in the language.
AUTHORS
Yukinobu Hamuro, Naoki Katoh, Katsutoshi Yada, Stephane Cheung
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- PARAMETERS
-
- FILE OPTIONS
-
- USAGE
-
- DIAGNOSTICS
-
- SEE ALSO
-
- BUG REPORT
-
- AUTHORS
-
This document was created by
man2html,
using the manual pages.
Time: 22:43:52 GMT, June 24, 2003