To create a report, we first need to define the report format, then compose a script to generate the results, and finally, publish the report.
In this lesson, we will create a report on the 20 most popular categories sold under 6-digit classification code based on the same artificial data used in "Basic Commands".
Rank | 6-digit classification code | Category name | Quantity |
---|---|---|---|
1 | 140323 | isotonic drink | 1233 |
2 | 130121 | candy, caramel | 930 |
3 | 140325 | nutritional drink | 860 |
4 | 110605 | instant (miso) soup | 797 |
5 | 110705 | frozen farm products | 757 |
6 | 111603 | tube-shaped fishcake | 757 |
7 | 130129 | rice cake | 741 |
8 | 111701 | pickles | 692 |
9 | 110209 | olive oil | 659 |
10 | 130305 | daily farm products | 643 |
11 | 140797 | alcoholic drinks | 621 |
12 | 110119 | cooking wine | 615 |
13 | 111605 | cake pounded fish | 615 |
14 | 130137 | Japanese sweets | 612 |
15 | 140601 | sake | 611 |
16 | 140211 | tomato juice | 603 |
17 | 111801 | tofu | 588 |
18 | 111207 | raw and boiled noodles | 578 |
19 | 130205 | yogurt | 571 |
20 | 111001 | wheat flour | 570 |
Let's use the commands learned in "Basic Commands" and apply them to rank and select the top 20 most popular categories, and publish the results by converting xmlTable to html file.
Before we start, let us sit back and think about how to process the data by selecting and combining the commands in order to produce the above report.The process:
This is one of the possible solutions, the commands used in this example can be used interchangably with other commands where appropriate. Please refer to "One Point" at the end of this lesson. Now, let's explain the logistics of steps 5, 6 and 8.
Th command xtbest selects the records with the same key value in order by the range specified at -R. If the key value is not defined, the selection of records will be based on all records in the data. The format of parameter at -R is: firstLine_LastLine . Note that xtbest takes into account the information in the sort tag and returns records in that ordering. The following are some examples:
Definition | Interpretation |
1_20 | line numbers from 1 to 20 |
1_20 | line numbers greater than 1 and are smaller than 20 |
5_10 | line numbers greater than or equal to 5 and are smaller than 20 |
MIN_15 | line numbers smaller than or equal to 15 |
20_MAX | line numbers greater than 20 |
MIN_MAX | all |
Solution: In our example, the command for choosing the top 20 lines is "xtbest -R MAX_20".
In order to give the sequential number for lines, we use xtnumber. This command sequentially numbers the records with the same key value. When -k is omitted, all records will be based on the same key value. The starting value is specified by parameter -S and the interval is specified by parameter -I. For example, the parameter "-S 10 -I 2", will turn out to be 10, 12, 14 and so forth. When neither -S nor -I are sepcified, the default value will be "-S 1 -I 1".
When -B is specified, the same number is assigned to the lines of the same
key value.
The attribute name forthe new number sequence is specified at -a.
Solution: To generate a sequence of numbers is done by "xtnumber -a order".
The command xt2html converts xmlTable file to html file. This command converts the content in the
tag in the xmlTable into a table in an html file. The attribute tilte and comments in theFD has a built in HTML conversion function. To publish the report, first highlight the data file in FD, then press "H". A submenu will pop up at the bottom, select the output format you would like to publish. Text format allows you to import data into spreadsheets and analytical tools such as SPSS / SAS, while HTML format will have the convenience to publish reports on the web and share with others.
We have briefly explained the procedure to create the script for this solution, let's see how the actual script is written below.
#/bin/bash #=============================================================== # MUSASHI bash script #=============================================================== #---- Title title="Top 20 6-digit categories sold" #---- Comment comment="Basic reports" #---- Variables inPath="/home/public/tutorial" #--------------------------------------------------------------- # Commands #--------------------------------------------------------------- xtcut -f CategoryCode6,Quantity -i $inPath/dat.xt | xtagg -k CategoryCode6 -f Quantity -c sum | xtjoin -k CategoryCode6 -m $inPath/gicfs6.xt -f CategoryCode6Name | xtbest -R MIN_20 -s Quantity%n%r | xtnumber -a Rank | xtcut -f Rank,CategoryCode6,CategoryCode6Name,Quantity | xtheader -l "$title" -c "$comment" -o best.xt #=============================================================== |
The result from the execution of script is shown below:
<?xml version="1.0" encoding="euc-jp"?> <xmltbl version="1.00"> <header> <title> Top 20 6-digit categories sold </title> <comment> Tutorial </comment> <field no="1"> <name>Rank</name> </field> <field no="2"> <name>CategoryCode6</name> <sort priority="2"> </sort> </field> <field no="3"> <name>CategoryCode6Name</name> <sort priority="3"> </sort> </field> <field no="4"> <name>Quantity</name> <sort priority="1"> </sort> </field> </header> <body><![CDATA[ 1 140323 isotonic drink 1233 2 130121 candy, caramel 930 3 140325 nutritional drink 860 4 110605 instant (miso) soup 797 5 110705 frozen farm products 757 6 111603 tube-shaped fishcake 757 7 130129 rice cake 741 8 111701 pickles 692 9 110209 olive oil 659 10 130305 daily farm products 643 11 140797 alcoholic drinks 621 12 110119 cooking wine 615 13 111605 cake pounded fish 615 14 130137 Japanese sweets 612 15 140601 Sake 611 16 140211 tomato juice 603 17 111801 tofu 588 18 111207 raw adn boiled noodles 578 19 130205 yogurt 571 20 111001 wheat flour 570 ]]></body> </xmltbl> |
The HTML file based on the script's name will be generated. Unlike text files which the results can be check from FD, you will need to use the web browser to check the results of the HTML report.
There is no fix way to arrange the order of the commands in certain circumstances. In the above example, reverse execution order of xtagg and xtcut will result the same. Further, the selection of the first 20 rows of records is done after joining category name, it can however be done after the selection of records. xtcal command can also be used in place of xtnumber. When composing the script, what's important is the order of commands, selection of commands, and decision on optimizing the processing efficiency in an ingenious manner.
For example, the amount of process data can be reduced by executing xtcut first before xtagg to increase the efficiency. Joining extra attributes to the data set after performing all necessary functions such as selecting the first 20 rows can save the extra processing done on the data which will eventually be filtered out and thus improve efficiency. Even though xtcal and xtnumber can carry out the same function, they both have different characteristics. xtcal has more functionality and xtnumber is more efficient in generating sequential numbers. |
The following are more sample reports pertained to the use of selecting the best records. Check your results with the scripts and output files given below.
Report Name | Script Name | Result - xt | Result - HTML |
Top 20 6-digit categories with the highest sales amount | best1.sh | best1.xt | best1.html |
Top 20 manufacturers' by unit sales | best2.sh | best2.xt | best2.html |
The 20 lowest sales amount in 6-digit categories | best3.sh | best3.xt | best3.html |
6-digit categories which made the top 20-30 | best4.sh | best4.xt | best4.html |