Basic Reports: Most popular category sold

To create a report, we first need to define the report format, then compose a script to generate the results, and finally, publish the report.

In this lesson, we will create a report on the 20 most popular categories sold under 6-digit classification code based on the same artificial data used in "Basic Commands".

The report view of the output

Top 20 categories sold under 6-digits classification code

Rank 6-digit classification code Category name Quantity
1 140323 isotonic drink 1233
2 130121 candy, caramel 930
3 140325 nutritional drink 860
4 110605 instant (miso) soup 797
5 110705 frozen farm products 757
6 111603 tube-shaped fishcake 757
7 130129 rice cake 741
8 111701 pickles 692
9 110209 olive oil 659
10 130305 daily farm products 643
11 140797 alcoholic drinks 621
12 110119 cooking wine 615
13 111605 cake pounded fish 615
14 130137 Japanese sweets 612
15 140601 sake 611
16 140211 tomato juice 603
17 111801 tofu 588
18 111207 raw and boiled noodles 578
19 130205 yogurt 571
20 111001 wheat flour 570

Creating the report

Let's use the commands learned in "Basic Commands" and apply them to rank and select the top 20 most popular categories, and publish the results by converting xmlTable to html file.

Before we start, let us sit back and think about how to process the data by selecting and combining the commands in order to produce the above report. 

The process:

  1. Select "CategoryCode6" and "Quantity"  
  2. Compute the total of quantity for each 6-digit category code item  
  3. Join the category codes to the corresponding article name  
  4. Sort the category codes by quantity in descending order  
  5. Select first twenty records in the data  
  6. Ranking the twenty records in order 
  7. Rearrange the attributes in the order of "rank", "article code", "article name" and "quantity"  
  8. Convert xmlTable to HTML file

This is one of the possible solutions, the commands used in this example can be used interchangably with other commands where appropriate. Please refer to "One Point" at   the end of this lesson. Now, let's explain the logistics of steps 5, 6 and 8.

Step 5: Selecting records by specifying line number (xtbest)

Th command xtbest  selects the records with the same key value in order by the range specified at -R. If the key value is not defined, the selection of records will be based on all records in the data. The format of parameter at -R is: firstLine_LastLine . Note that xtbest takes into account the information in the sort tag and returns records in that ordering. The following are some examples:

Definition Interpretation
1_20 line numbers from 1 to 20
1_20 line numbers greater than 1 and are smaller than 20
5_10 line numbers greater than or equal to 5 and are smaller than 20
MIN_15 line numbers smaller than or equal to 15
20_MAX line numbers greater than 20
MIN_MAX all

Solution: In our example, the command for choosing the top 20 lines is "xtbest -R MAX_20".

Step 6: Appending numbering to each record (xtnumber)

In order to give the sequential number for lines, we use xtnumber. This command sequentially numbers the records with the same key value. When -k is omitted, all records will be based on the same key value. The starting value is specified by parameter -S and the interval is specified by parameter -I.  For example, the parameter "-S 10 -I 2", will turn out to be 10, 12, 14 and so forth. When neither -S nor -I are sepcified, the default value will be "-S 1 -I 1".

When -B is specified, the same number is assigned to the lines of the same key value.

The attribute name forthe new number sequence is specified at -a.

Solution: To generate a sequence of numbers is done by "xtnumber -a order".

Step 8: Publishing xmlTable to HTML reports (xt2html)

The command xt2html  converts xmlTable file to html file. This command converts the content in the tag in the xmlTable into a table in an html file. The attribute tilte and comments in the

tag will be written out as the head and title tag of the table.

FD has a built in HTML conversion function. To publish the report, first highlight the data file in FD, then press "H". A submenu will pop up at the bottom, select the output format you would like to publish. Text format allows you to import data into spreadsheets and analytical tools such as SPSS / SAS, while HTML format will have the convenience to publish reports on the web and share with others.

Script

We have briefly explained the procedure to create the script for this solution, let's see how the actual script is written below.
#/bin/bash
#===============================================================
# MUSASHI bash script
#===============================================================

#---- Title
title="Top 20 6-digit categories sold"

#---- Comment
comment="Basic reports"

#---- Variables
inPath="/home/public/tutorial"

#---------------------------------------------------------------
# Commands
#---------------------------------------------------------------
xtcut -f CategoryCode6,Quantity -i $inPath/dat.xt |
xtagg -k CategoryCode6 -f Quantity -c sum |
xtjoin -k CategoryCode6 -m $inPath/gicfs6.xt -f CategoryCode6Name |
xtbest -R MIN_20 -s Quantity%n%r |
xtnumber -a Rank |
xtcut -f Rank,CategoryCode6,CategoryCode6Name,Quantity |
xtheader -l "$title" -c "$comment" -o best.xt
#===============================================================

The result from the execution of script is shown below:

<?xml version="1.0" encoding="euc-jp"?>
<xmltbl version="1.00">
<header>
<title>
Top 20 6-digit categories sold
</title>
<comment>
Tutorial
</comment>
<field no="1">
<name>Rank</name>
</field>
<field no="2">
<name>CategoryCode6</name>
<sort priority="2">
</sort>
</field>
<field no="3">
<name>CategoryCode6Name</name>
<sort priority="3">
</sort>
</field>
<field no="4">
<name>Quantity</name>
<sort priority="1">
</sort>
</field>
</header>
<body><![CDATA[
1 140323 isotonic drink 1233
2 130121 candy, caramel 930
3 140325 nutritional drink 860
4 110605 instant (miso) soup 797
5 110705 frozen farm products 757
6 111603 tube-shaped fishcake 757
7 130129 rice cake 741
8 111701 pickles 692
9 110209 olive oil 659
10 130305 daily farm products 643
11 140797 alcoholic drinks 621
12 110119 cooking wine 615
13 111605 cake pounded fish 615
14 130137 Japanese sweets 612
15 140601 Sake 611
16 140211 tomato juice 603
17 111801 tofu 588
18 111207 raw adn boiled noodles 578
19 130205 yogurt 571
20 111001 wheat flour 570
]]></body>
</xmltbl>

The HTML file based on the script's name will be generated. Unlike text files which the results can be check from FD, you will need to use the web browser to check the results of the HTML report.
OnePoint - Processing Order
  There is no fix way to arrange the order of the commands in certain circumstances. In the above example, reverse execution order of xtagg and xtcut will result the same. Further, the selection of the first 20 rows of records is done after joining category name, it can however be done after the selection of records. xtcal command can also be used in place of xtnumber. When composing the script, what's important is the order of commands, selection of commands, and decision on optimizing the processing efficiency in an ingenious manner.
  For example, the amount of process data can be reduced by executing xtcut first before xtagg to increase the efficiency. Joining extra attributes to the data set after performing all necessary functions such as selecting the first 20 rows can save the extra processing done on the data which will eventually be filtered out and thus improve efficiency.
  Even though xtcal and xtnumber can carry out the same function, they both have different characteristics. xtcal has more functionality and xtnumber is more efficient in generating sequential numbers.

Exercises

The following are more sample reports pertained to the use of selecting the best records. Check your results with the scripts and output files given below.
Report Name Script Name Result - xt Result - HTML
Top 20 6-digit categories with the highest sales amount best1.sh best1.xt best1.html
Top 20 manufacturers' by unit sales best2.sh best2.xt best2.html
The 20 lowest sales amount in 6-digit categories best3.sh best3.xt best3.html
6-digit categories which made the top 20-30 best4.sh best4.xt best4.html