Basic commands
Lesson 1: Selecting attributes (xtcut)

MUSASHI is made up of a set of commands that does data transformation. These commands can be executed from command prompt like Unix standard commands. Although a dataset can be processed by excecuting a combination of several commands at the command prompt, we should create a shell script for executing the commands.

In this lesson, you will learn the usage of xtcut command. The xtcut command allows users to select and cut out the necessary data for the specified attribute(s) /column(s) in the data set to select relevant information and filter out extranous attributes pertained to your purpose.

Summary of options and usage: xtcut

Now let's get started!


Using xtcut

Step 1: Creating your first script with FD

Goto your working directory /home/public/lesson/basic, press "S" at FD editor to generate a script template. Enter the name of your new script "xtcut" and press RETURN. The format of the commands are predefined in the script template, simply modify the template to create your own customized script.


Figure 1.

A new script file xtcut.sh has been created.


Step 2: Editing your script with FD

At the FD interface, highlight the new script xtcut.sh, then press "e" to start the vi editor.

A script sample is shown in Figure 2 below. The first line tells the bash shell that the following MUSASHI commands below can be executed. Comments are defined with a "#" at the beginning of the line. Adding comments to your scripts and creating blank line in between sections is always helpful as you are composing the script increases readability when we revisit the script in the future.


Figure 2.

Create a title of data at title="" and comment="" as in Figure 2. These character string are treated as variable which will then be passed to the commands.

Enter "tutorial" as "title=""", and "xtcut" as "comment=""". Next, define the path information of the dataset at "inPath":

/home/public/tutorial

One Point: Why the input file is storedas shell variable?
The directory path of the input file is defined in variable inPath. Alternatively, we can enter the path name directly when executing the command. The first method of defining path information on the top is recommended since the current dataset used will be more clear to us.

Later in this tutorial, you will come across cases where one or more input file(s) is required. In this case, defining the input files in "inPath" helps in indicating the location of the files used, thus increasing the visual impact of the script.

This allows for easy maintenance in the future if we need to redefine the path of the input file, and it eases other users' effort to pick up information regarding the script.


Step 3: Defining Attributes and Options

There are a total of 18 data attributes in dat.xt, let extract the attribute "date", "quantity", and "amount" with xtcut command.

IMPORTANT NOTE: Attributes and filename are case sensitive in MUSASHI, be sure that the attributes defined in command parameters matches with the field attributes of the data set(dat.xt).

Specify the parameters as follows:

Attributes - -f Date,Quantity,Amount
Note: In this example, the parameter extracts date, quantity and amount . Attribute name is case sensitive, spaces should be omitted between multiple field arguments.

Input file - -i $inPath/dat.xt
Note: $inPath is the variable holding the location of the data defined at the top of the script /home/public/tutorial. You may also define the path + name at inPath: /home/public/tutorial/dat.xt

Output file - -o xtcut.xt
Note: The output file xtcut.xt is created at the current directory /home/public/lesson/basic. Be sure to define the file extension as ".xt"

Delete the vertical line "|" at the rightmost part of the line of "xtcut". Be sure xtheader should be commented out with "#" at the beginning of the line:

xtcut -f fieldName -i $inPath/dat.xt #xtheader -l "$title" -c "$comment" -o dat.xt
The resulting script will look like:
#/bin/bash
#===============================================================
# MUSASHI bash script
#===============================================================

#---- Title
title="Tutorial"

#---- Comment
comment="xtcut"

#---- variables
inPath="/home/public/tutorial"

#---------------------------------------------------------------
# commands
#---------------------------------------------------------------
xtcut -f Date,Quantity,Amount -i $inPath/dat.xt -o xtcut.xt
#xtheader -l "$title" -c "$comment" -o dat.xt
#===============================================================

Double check for errors, then save the script and quit the editor, you will be brought back to the main screen:

:wq

Step 4. Running the script

Highlight xtcut.sh in the file directory and execute xtcut.sh with uppercase "X". You will be prompted for the following options:
1:run, 2:run in background, 9:cancel ->
Enter "1" to run the script
The following message will then appear:
running...#OK# xtcut -f Date,Quantity,Amount -i /home/bear/tutorial/dat.xt -o xtcut.xt in=38733 out=38733 20030221 163455

When the command is done, you will see the message complete at the end:


Step 5: Dealing with Errors

Scenario 1: If the message "complete" didn't come up for a long time, an error might have occured. Terminate the process by "Ctrl+c", and edit the script again.

Scenario 2: When you see "#NG#" or <ERROR> when you are running the script, it indicates an error has occurred. Go back and revise the script.


running...#NG# xtcut -f date,Quantity,Amount -i /home/bear/tutorial/dat.xt -o xtcut.xt "field name not found : [date]"
complete(log message was stored in "log" file)
Hit any key.

Step 6: The Result

When the script has been successfully compiled, new file "xtcut.xt" will be created. Check the content of the file and you will see a new data set with three attributes: date, quantity and amount as shown in the figure below.

Creating the header

The MUSASHI header xtheader command allows us to modify the information in title, comment and field between the header tags of the .xt data set. Let's go back and revisit xtcut.sh and modify the header command that has been commented out.

Summary of options and usage: xtheader


Step 1:

Multiple commands can be executed at the same time using pipe "|". This will send the output file from xtcut command as an input for xtheader instead of writing the output directly to xtcut.xt.

Step 2:

Remove "-o xtcut.xt" in the xtcut command line and define an output file "dat.xt" at the end of xtheader command line.

Step 3:

Uncomment the xtheader command, i.e., delete "#". of the next line through pipe "|" instead of being written into xtcut.xt.

Step 4:

The parameters "-l" and "-c" are used in xtheader to specify the title and the comment, respectively. In Step 2 of this lesson, you should have defined "$title" and "$comment".

#===============================================================
# MUSASHI bash script
#===============================================================

#---- Title
title="Tutorial"

#---- Comment
comment="xtcut"

#---- variables
inPath="/home/bear/tutorial"

Step 5:

Change the parameter -o to write the output file to "xtcut.xt".
xtcut -f Date,Quantity,Amount -i $inPath/dat.xt |
xtheader -l "$title" -c "$comment" -o dat.xt

Let us save and execute this script. Open the file dat.xt, and you will noticed that the title and the comment have been changed.

One Point: Importance of title and comment
Properly defined title and the comment are useful for us to identify the output file. It is entirely optional to use xtheader, but you should always add the title and the comment for future reference.

One Point: Message at run-time
Every MUSASHI command executed and the run-time messages shown at run-time is saved in the log file under the same directory, it contains information on what command is used, parameters, input file, output file, and the errors occured.

One Point: -i and -o
The two ways to describe input and output filenames for MUSASHI commands:
1. Use the parameter -i and -o
2. Use UNIX standard input and ouput

For example, the command "xtcut  -f date -i input.xt -o output.xt" can be replaced by "xtcut -f date <input.xt  >output.xt" which gives the same results. The only difference is that when -i and -o are used, the input and output filenames are recorder in log file.

One Point: Suffix
Suffix of a file name such as ".xt" is not always required. It is conventionally attached simply for to differentiate between the type of file.
Although script file without "sh" extension will still be executable under the bash shell "e.g. ./xtcut", yet, the script must attach a suffix ".sh" in order to execute under FD.
MUSASHI also have the capability to recognize xmlTable data without any problem.

Exercise

Create the following data by using script to extract different attributes from the sample data file and output files as shown in the following table.

Report name
Script name Output file(xt) Output file(html)
Extract manufacturer, brand and gross profit xtcut1.sh xtcut1.xt (omitted due to large size) omitted due to large size
Extract customer ID and receipt number xtcut2.sh xtcut2.xt (omitted due to large size)

Home  |  Next> Lesson 2: Aggregation I