Japanese Site |
Overview | Applications | Documentation | Training | Development |
XML Table1. Why XML (limitation of relational database? MUSASHI describes data for processing in XML. It is presumed here that readers have fundamental knowledge. 1. Why XML (limitation of relational database)?Before explaining why MUSASHI uses XML, please answer the following questions.
1.1. Limitation of table based data modelingIn most of companies, the information system uses relational database for storing their data. Relational database represents data as a table consisting of attributes and lines. However, we often encounter the situation in companies that real workflow is difficult to be modeled by a table. As a familiar example, let us consider to write a diary using Excel. Real work is an event which occurs as time proceeds. However, table based data modeling as in relational database, daily operation is usually modeled by things such as article, customer, order voucher. With this modeling, it is difficult to model a temporal relation such as interruption at point of sales terminal. 1.2. Disadvantage of database design Additional problem with relational database is that database design is difficult and time consuming. When designing a database, initially people of the branch of information system and those working in daily operations discussed which data is necessary to be stored in a database. After this, they start to design a data structure. Database experts then spend a plenty of time to examine several details such as normalization and constraints in order to make a perfect design. This is because once it is used in daily operation, modification of an database requires a lot of time. 1.3. data disappearanceWhen designing relational database, system developer first examines which data ought to be stored in a database. The data items other than those which are to be stored will be neglected. As was in the question posed above, if a user does not need the information about changes, such information will be dropped out of design scope. However, looking at daily operation at point of sales terminal,the information about changes is displayed on the register screen. Here we introduce the following story. A president of some company who does not know much about computers murmured that all the receipt information are stored in a computer when he saw a receipt at one shop of the company. This sense is quite normal. But, the reality is that the information which was once judged unnecessary to be stored in a database does not exist in a computer at all. It may quite often happen that the original receipt information and discount price are also lost. Such data loss causes a serious problem from the viewpoint of data mining. On cannot predict which data becomes necessary for future data mining analysis. Information about changes may become necessary in unexpected situation. Therefore, we believe that it is critically important to record all data occurred. Since a hard disk of 40 GB costs only 300 dollars nowadays, hardware cost does not impose any constraint. Thus, how to record data becomes an important issue. From this viewpoint, we have paid our attention to XML. 1.4. System beyond reach of end users The final issue about current information system is that end users cannot directly reach the information system since they are not familiar with details of computers. Although relational database is considered to be easy to construct, the reality is not the case. It is hard for a general end user to understand the data structure of relational database. If information system branch replies that it is impossible for such and such reason for an end user request to add a new attribute to the existing database, the user cannot retort to the reply in general. The system based on RDB is a black box for an end user although some user familiar with computers may send an SQL query. However, the problems of data disappearance and database design still remain and general users complain, "why the information system of our company is difficult to use" and "why our system is so slow" if they had to wait for ten minutes until the answer to a simple SQL query returns. 2. Record all input data by XMLThe system we propose is very simple. All input data are recorded by XML. Data are input in various formats in an information system of a company such as data input by scanning articles at point of sales terminal, update data for article master file, online order by other companies, returns by sales staffs and etc. The data from such various input source are usually stored and managed by a relational database in a unified way. However, there are many negative effects in data management by RDB as we have seen above. In order to get rid of the effects, we propose to abandon to use RDB, but instead to record all input data in chronological order by XML. For an end user, all what exists in the system is input data, and all information he/she needs lies in there. You not need an expensive and complicated software to view the content, but all you have to do is to open the file by an editor or a browser. If you feel sacred to access the computer at the system branch, you just ask the copy of the file. A very simple solution. This is our philosophy and idea. Here are several problems we have to cope with.
2.1. Input screen and record by XMLWe will explain how we design the input screen and record input data by XML, based on a simple program for receiving orders. Suppose a staff inputs the data with a telephone receiver in one hand on the input screen shown in Figure 1. FIgure 1 Screen for order receipt The input data on this screen are customer ID, article number and article quantity. It is assumed that the other data such as customer name, article name, unit of quantity and price unit are transferred from customer master file and article master file and then displayed on the screen. It is also assumed that the amount and the total amount are calculated from input data, and that the date and the time are displayed from the internal clock of a computer.
However, there are problems with this way of recording; 1) There is no guarantee that the data displayed via article master file continues to exist in the master file forever,2) the program for computing the total summary may have a bug, and among other, 3) a user cannot understand at a first glance the details of the XML file (e.g. the article name of '10001"). For these reasons, we propose to record all data on the screen viewed by the staff. This will result in XML shown in Figure 3 below.
The problems posed above have been resolved with this way of description.
Everyone can understand the contents of the data. However, Ii this description sufficient? It has simply recorded the events occurred in the temporal order by XML. How can we describe the event of interruption input? We do not have a solution to this yet. 2.2. Process of XML dataHow do we process data accumulated by XML? The trend to use XML for data accumulation as database instead of RDB can be found in the activity of software development of Ygdracill. The data structure of XML is very flexible, but suffers inefficiency of processing speed. Also, considering that Excel of Microsoft enjoys popularity, the way to use a table for data presentation cannot be neglected due to its simplicity. From these views in our mind, we developed XML Table. The overall idea is to accumulate data by XML and to process it by XML table. Data processing in our system is done after converting XML data to XMLtable. Details will be given in the following sections. 2.3. Data update
|
<?xml version="1.0"
encoding="UTF-8"?> <xmltbl version="1.00"> <head> <field no="1"> <name>order receipt number</name> </field> <field no="2"> <name>statemnet_line_number</name> </field> <field no="3"> <name>article number</name> </field> <field no="4"> <name>article name</name> </field> <field no="5"> <name>quantity</name> </field> <field no="6"> <name>amount</name> </field> </head> <body><![CDATA[ 0001 1 010001 US beef 10 2400 0001 2 010002 meat ball 1 1200 ]]></body> </xmltbl> |
This data is obtained by converting XML data of Figure 3 by command shown in Figure 5. By -k parameter, treating the region enclosed by tag "/order receipt/statement" as a single line, number of order receipt, attributes of "statement", i.e., article number, article name, quantity and amount are output by parameter -f. The result is an XML Table as shown in Figure 6. See List of commands to see the details about how to use commnads.
xml2xt -k/order_receipt/statement -f order_receipt@number,
statement@number, article number, article name, quantity, amount |
As seen from Figure 6, XMLtable is entirely an XML document. The uppermost
element <xmltbl> has two elements <head> and <body>. In
<body> tag, data in a table form is described in which LF is used
as line delimiter, and space is used for attribute delimiter. If you
look at this part, it is a data structure commonly used in Unix, and can
also be used by awk and grep. Right after <body> tag, there is a <![CDATA[....]]>
tag which instructs XML parser to treat it as it is. Namely, in order for
XML parser to recognize LF, such CDATA tag is required (in usual, XML parser
neglects LF within a tag bu treats it as a space character). However, commands
dealing with Xmltable simply search for string "<body><![CDATA"
and regard what follows the string as a table data. However, by doing
so, if this data is displayed by an application such as InternetExplorer
which can interpret XML, a user can view it as a table well structured. If
such CDATA tag does not exist, the data will be displayed as a sequence of
character strings.
<head> tag acts as a dictionary for this data in which information
about name and position of each attribute is described by <field> tag.
Various information other than information about name and position is stored
in <head> tag. Figure 6 illustrates such XML table.
<?xml version="1.0" encoding="UTF-8"?>
<xmltbl version="1.00"> <head> <title> XML Table sample </title> <comment> Write comments here </comment> <field no="1"> <name>order_receip_number</name> <comment>comments on attributes</comment> <sort priority="1"> </sort> </field> <field no="2"> <name>statement_line_number</name> </field> <field no="3"> <name>article number</name> </field> <field no="4"> <name>article name</name> </field> <field no="5"> <name>quantity</name> <sort priority="2"> <numeric/><reverse/> </sort> </field> <field no="6"> <name>amount</name> </field> </head> <body><![CDATA[ 0001 1 010001 US beef 10 2400 0001 2 010002 meat ball 1 1200 ]]></body> </xmltbl> |
Tags available in XMLtable are shown in tree form as follows.
xmlTalbe +head +title +comment +field@no +name +comment +sort@priority +reverse +numeric +body |
The following explains what each tag means.
tag |
explanation |
<xmltbl> | tag that stands for xmltable (mandatory) |
<xmltbl><head> | meta data of table data shown in <body> (mandatory) |
<xmltbl><head><title> | title for the data (optional) |
<xmltbl><head><comment> | comment for the data (optional) |
<xmltbl><head><field> | attribute information (mandatory) "no" stands for the attribute position as number. |
<xmltbl><head><field><name> | attribute name (mandatory) |
<xmltbl><head><field><comment> | comment for the attribute (optional) |
<xmltbl><head><field><sort> | This indicates that the data is sorted by this attributej.
Attribute "priority" indicated the priority of the attribute used for sorting.
Most of commands determine the algorithm actually applied. |
<xmltbl><head><field><sort><numeric> | It is used if sorted by numerical order (optional) |
<xmltbl><head><field><sort><reverse> | It is used if sorted in decreasing order (optional) |
<xmltbl><body> | Data is inserted here (mandatory) |
MUSASHI | publications | development team | related links | mailing list | user group | |
Copyright 2004 MUSASHI |