Japanese Site |
Overview | Applications | Documentation | Training | Development |
How data processing is done with MUSASHIA set of commands in our system are developed following the philosophy of UNIX. Namely, it enables us to create various data processing by combining small commands having a single function.As was already explained, it is assumed in our system that data is stored in an XML file. In order to run our commands, an XML file is converted into an XML table. In order to process a data in XML table, we prepare a lot of commands varying from an operator of selecting attributes to that of joining two tables. As in most of UNIX commands, all commands read XML table(s) from standard input, process them and writes the result into a standard output. Data is normally stored on a hard disk. By way of redirection, the resulting output is subsequently read or stored. Typical example of data processing using UNIX is as follows. standard input → process → standard output: mcut -f date<in.xt >out.xt With in.xt as a standard input, the command mcut (with parameter -f date) processes in.xt and the output is written into out.xt. More specifically, in.xt is an XML table as illustrated in Fig. , and mcut selects the attributes specified by the parameter -f. In the example in Fig. 1, The single attribute "date" is selected from in.xt which consists of four attributes. The result is written into the file out.xt. Let us consider how to combine one than one commands. Two or more commands can be combined by the function "pipe" which has been conventionally used in UNIX. "Pipe" lays a pipeline between commands by which a standard output of the preceding command is linked to the standard output of the succeeding command. input → command1=command2 → output Note: If piping function is equipped with, you can call the commands either from shell such as bash, perl and tcl. The standard UNIX shell (bash) is presumed throughout this document. In order to illustrate the pipe function, let us link xtagg after xtcut in the previous example. It is simply done xtcut -f date, amount <in.xt | xtagg -k date -f amount -csum >out.xt The command xtcut simply selects two attributes date and amount which are then handed over to the command xtagg by using "pipe". The command xtagg is a sum command which sums up (-csum) the amounts of the values in the attribute "amount" with date as key (-k date). The result is written out into out.xt.Combining several simple commands in this manner enables us to carry out various complex computations. This idea is compared to Lego block. Various objects can be constructed by combining various types of blocks. The difference lies in that in building Lego, the final object may be altered during the building process, stimulated by building process itself while in our commands, the final output is fixed and we know the set of commands available, but it is sometimes difficult to figure out how to combine them which is like an intellectual puzzle. In fact, a simple data processing can be written by a combination of only a few commands while a complex one requires more than ten commands to be combined. Thus, instead of writing in a single line a complex combination of all required commands, it is better to appropriately break it into a number of lines so that every line contains a small number of commands ending with writing the intermediate result into a wrok file. In the tutorial, we illustrate how we can use our commands to carry out various tasks ranging from simple processes to complex ones. This will convince you how our system has a flexible power that allows us to efficiently process various tasks with a large amount of data. |
MUSASHI | publications | development team | related links | mailing list | user group | |
Copyright 2004 MUSASHI |