xtbucket

Section: User Commands (1)
Updated: 2002-10-26
Index Return to Main Contents

NAME

xtbucket - partition data in uniform groups

SYNOPSIS

xtbucket -f attribute -n number of partitions [-k key attribute(s)] [-F output format{0|1|2}] [-r] [-i INPUT] [-o OUTPUT] [-z] [-t] [-T TEMP DIRECTORY]

DESCRIPTION

xtbucket partition numerical attributes into a fixed number of subintervals (buckets) evenly based on the key attribute. The processing speed increases with the precision of numbers as more time is needed for distinct values to be distributed into buckets. Decreasing the precision of distinct values by rounding off the decimals and digits in the attribute -f with command "xtcal" will greatly reduce processing time.

PARAMETERS

-k key attribute(s): key attribute(s) is the unit on which a partition is performed. When the -k option is omitted, all lines will be based on one key unit.
-f attribute list: the attribute's value to be partitioned (only one attribute can be defined). Define new attribute name for the new bucket values created by "-f attributeName:NewAttributeName"
-n number of partitions: the number of buckets to distribute numerical values to . If the number of lines is smaller than the number of subintervals, the command will take the smaller value by default.
-F output format: the specification of partition format are as follows:
: -F 0 -- bucket value in whole numbers
: -F 1 -- buckets value in ranges
: -F 2 -- buckets value in ranges
-r reverse selection: determine the bucket index so that values in the bucket with smaller index are larger than those with the larger index

FILE OPTIONS

-i input filename: if a suffix of the filename is '.gz', the command acts as a filter, extracting the compressed file for processing. The command will read the file as standard input when "-i" is not specified.
-o output filename: if a suffix of the filename is '.gz', the command automatically returns the output data in zip archive. When "-o" is not specified, the result will sent to standard output.
-T temp file directory: specify the directory name for temporal files used in this command.
-z zip archive: compress the standard output to zip archive. When the option "-o" is not given and "-z" is specified, the output will be compressed as zip archive.
-t plain text: xtagg treats the input and output data as plain text format.

USAGE

Input file - dat.xt:
<field no="1">
<name>CustomerID</name>
</field>
<field no="2">
<name>Date</name>
</field>
<field no="3">
<name>TotalQuantity</name>
</field>
<field no="4">
<name>TotalAmount</name>
</field>
</header>
<body><![CDATA[
A00001 20020826 5 2090
A00001 20021221 8 3038
A00002 20020112 1 341
A00002 20020208 12 4812
A00002 20020726 9 3379
A00002 20020822 10 4013
A00002 20021225 9 3532
A00003 20020727 8 1983
A00003 20020813 9 2898
A00003 20021008 11 4110
A00004 20020214 1 365
A00004 20020415 9 4349
A00004 20020625 13 5268
A00004 20020810 5 1805
A00004 20021014 2 612
A00004 20021016 11 3410

Example 1. Put amount and quantity into data buckets of 5.
e.g. xtbucket -f amount,quantity -n 5 -i dat.xt -o rsl.xt Output: -rsl.xt

: <field no="5">
<name>AmountRange</name>
</field>
<field no="6">
<name>QuantityRange</name>
</field>
</header>
<body><![CDATA[
A00001 20020826 5 2090 2 2
A00001 20021221 8 3038 3 3
A00002 20020112 1 341 1 1
A00002 20020208 12 4812 4 4
A00002 20020726 9 3379 3 3
A00002 20020822 10 4013 4 4
A00002 20021225 9 3532 3 3
A00003 20020727 8 1983 2 3
A00003 20020813 9 2898 3 3
A00003 20021008 11 4110 4 4

Example 2. Place TotalAmount and TotalQuantity in ranges of buckets.
e.g. xtbucket -f TotalAmount:AmountRange,TotalQuantity:QuantityRange -n 5 -F 1 -i dat.xt -o rsl.xt Output: -rsl.xt
: <body><![CDATA[
A00001 20020826 5 2090 1692_2701 4.5_6.5
A00001 20021221 8 3038 2701_3771 6.5_9.5
A00002 20020112 1 341 66_1692 1_4.5
A00002 20020208 12 4812 3771_5064.5 9.5_12.5
A00002 20020726 9 3379 2701_3771 6.5_9.5
A00002 20020822 10 4013 3771_5064.5 9.5_12.5
A00002 20021225 9 3532 2701_3771 6.5_9.5
A00003 20020727 8 1983 1692_2701 6.5_9.5
A00003 20020813 9 2898 2701_3771 6.5_9.5
A00003 20021008 11 4110 3771_5064.5 9.5_12.5

Example 3. Partition TotalAmount and TotalQuantity into buckets of 5 by the key attribute.
e.g. xtbucket -k customerID -f TotalAmount:AmountRange,TotalQuantity:QuantityRange -n 5 -i dat.xt -o rsl.xt Output: -rsl.xt
: <body><![CDATA[
A00001 20020826 5 2090 1 1
A00001 20021221 8 3038 2 2
A00002 20020112 1 341 1 1
A00002 20020208 12 4812 5 4
A00002 20020726 9 3379 2 2
A00002 20020822 10 4013 4 3
A00002 20021225 9 3532 3 2
A00003 20020727 8 1983 1 1
A00003 20020813 9 2898 2 2
A00003 20021008 11 4110 3 3
A00004 20020214 1 365 1 1
A00004 20020415 9 4349 5 4
A00004 20020625 13 5268 5 5
A00004 20020810 5 1805 3 3
A00004 20021014 2 612 2 2
A00004 20021016 11 3410 4 5

BUG REPORT

If you find a bug in xtbucket, please send an electronic mail to musashi@adm.osaka-sandai.ac.jp. Before sending a bug report, please verify that you have the lastest version of MUSASHI. Read this manual carefully to ensure the error is not caused by a quirk in the language.

AUTHORS

Yukinobu Hamuro, Naoki Katoh, Katsutoshi Yada, Stephane Cheung

Index

NAME
SYNOPSIS
DESCRIPTION
PARAMETERS
FILE OPTIONS
USAGE
SEE ALSO
BUG REPORT
AUTHORS

This document was created by man2html, using the manual pages.
Time: 22:43:52 GMT, June 24, 2003