xtjoin

Section: User Commands (1)
Updated: 2002-10-26
Index Return to Main Contents

NAME

xtjoin - join fields of a reference file

SYNOPSIS

xtjoin -k key attribute(s) -m reference file -f attribute(s) [-K key attribute(s) in reference file] [-N] [-n] [-H] [-i INPUT] [-o OUTPUT] [-z] [-t] [-T TEMP DIRECTORY]

DESCRIPTION

This command joins the attribute(s) from the reference file to an input file with a common key attribute present in both files. You need not define the key attribute in the reference file with option -K if the key attribute name is the same in both the reference file and input file. If the attribute name of the field in the reference file to be joined exist in the input file, you may rename it with ':' (e.g. -f attribute name:new attribute name). Note that the value of the key attribute at -K must be unique, alternatively, you can use xtnjoin for inner joins. A outer join can be performed with the option -n which joins all rows from the reference file to the input file. When hash join is performed between the reference file and the input file, the output will not be sorted if the input file is not sorted by key attribute -k .

PARAMETERS

-k key attribute(s): key attribute(s) for joining the files
-K key attribute(s) in the reference file: key attribute(s) in a reference file. Required when the key attribute name in the reference file is different from the one in the input file.
-m reference file: reference file name
-f attribute(s): the attribute in the reference file to be joined to the input file
-H hash join: hash join
-N reference data not matched: data from the reference file not matching the input file will be joined
-n input data not matched: data not matching the reference file will be included in the output file.

FILE OPTIONS

-i input filename: if a suffix of the filename is '.gz', the command acts as a filter,
extracting the compressed file for processing. The command will re ad the file as standard input when "-i" is not specified.
-o output filename: if a suffix of the filename is '.gz', the command automatically ret urns the output data in zip archive. When "-o" is not specified, th e result will sent to standard output.
-T temp file directory: the directory name for temporary files used in this command.
-z zip archive: compress the standard output to zip archive. When the option "-o" i s not given and "-z" is specified, the output will be compressed as zip archive.
-t plain text: treat the input and output data as plain text format.

USAGE

Input file -dat.xt:
<field no="1">
<name>CustomerID</name>
</field>
<field no="2">
<name>Date</name>
</field>
<field no="3">
<name>TotalQuantity</name>
</field>
<field no="4">
<name>TotalAmount</name>
</field>
</header>
<body><![CDATA[
A00004 20020214 1 365
A00004 20020415 5 4349
A00004 20020625 3 5268
A00004 20020810 2 1805
A00004 20021014 2 612
A00004 20021016 11 3410
A00005 20020918 12 4554
A00005 20020923 1 491
A00006 20020606 3 1364
A00006 20020918 5 2195
]]></body>

Master file -master.xt
<field no="1">
<name>CustomerID</name>
</field>
<field no="2">
<name>Gender</name>
</field>
</header>
<body><![CDATA[
A00001 F
A00005 F
A00006 M
A00007 F
A00008 M
A00009 M
]]></body>

Example 1. Join customer gender information from the reference file to the transaction file. e.g. xtjoin -k CustomerID -m master.xt -f Gender -i dat.xt -o rsl.xt Output file -rsl.xt

: <field no="1">
<name>CustomerID</name>
</field>
<field no="2">
<name>Date</name>
</field>
<field no="3">
<name>TotalQuantity</name>
</field>
<field no="4">
<name>TotalAmount</name>
</field>
<field no="5">
<name>Gender</name>
</field>
</header>
<body><![CDATA[
A00005 20020918 12 4554 F
A00005 20020923 1 491 F
A00006 20020606 3 1364 M
A00006 20020918 5 2195 M
]]></body>

Example 2. Join all rows from the two files. e.g. xtjoin -k CustomerID -m master.xt -f Gender -nN -i dat.xt -o rsl.xt Output file -rsl.xt
: <field no="1">
<name>CustomerID</name>
</field>
<field no="2">
<name>Date</name>
</field>
<field no="3">
<name>TotalQuantity</name>
</field>
<field no="4">
<name>TotalAmount</name>
</field>
<field no="5">
<name>Gender</name>
</field>
</header>
<body><![CDATA[
A00001 * * * F
A00004 20020214 1 365 *
A00004 20020415 5 4349 *
A00004 20020625 3 5268 *
A00004 20020810 2 1805 *
A00004 20021014 2 612 *
A00004 20021016 11 3410 *
A00005 20020918 12 4554 F
A00005 20020923 1 491 F
A00006 20020606 3 1364 M
A00006 20020918 5 2195 M
A00007 * * * F
A00008 * * * M
A00009 * * * M
]]></body>

DIAGNOSTICS

Sort the data by key attributes to ensure the data sets to be joined properly.

BUG REPORT

If you find a bug in xtjoin, please send an electronic mail to musashi@adm.osaka-sandai.ac.jp. Before sending a bug report, please verify that you have the lastes t version of MUSASHI. Read this manual carefully to ensure the error is not caused by a q uirk in the language.

AUTHORS

Yukinobu Hamuro, Naoki Katoh, Katsutoshi Yada, Stephane Cheung

Index

NAME
SYNOPSIS
DESCRIPTION
PARAMETERS
FILE OPTIONS
USAGE
DIAGNOSTICS
SEE ALSO
BUG REPORT
AUTHORS

This document was created by man2html, using the manual pages.
Time: 22:43:54 GMT, June 24, 2003