[an error occurred while processing this directive]

The taxonomic model by the decision tree?


Meteorological condition and golf

In order to have understanding intuitively know concerning the taxonomic model, it keeps explaining making use of the chart under. Did this chart, when certain person, being some kind of meteorological condition, "do golf?", or "wasn't golf done?", it is the data concerning. When the chart the left, doing golf, with the meteorological data, the case of 9 cases is shown. When the middle chart, doing golf, the case of 5 cases is shown with the meteorological data. Well, it is problem here. With the past case which is shown in these two charts as a reference, weather "it clears up" and, air temperature "60", humidity "90", as for the wind "is not" with the tomorrow which is forecast, as for this person does golf, probably will be?
This kind of problem during our daily lives, it is the problem which is discovered well. Usually, the human obtained from past experience, "what?" it judges on the basis of. The what it is difficult completely to state clearly to convert, but that "what?" those which it states clearly converts in the form which is limited classification (estimate) are the model. Obscurity) concerning the data being to do judgement whether from the case of the past 14 cases by the item which with example of golf, you call four meteorological condition and is limited, the oak which does golf it makes the taxonomic model which how regards, unknown (there is no oak which does golf on the basis of that model, it does.


Quotation) J.R. The data analysis ' ÇÄÇ¿ÇÍÇï,1995, p.18, figure by Ç©ÇïÇåÇï ' AI 2-1 revision correction

Some rule is hidden?

Well, the model is drawn up from the case of the above-mentioned 14 cases preceding, this person does golf at the time of some kind of meteorological condition, (speaking conversely, it will keep thinking it does not designate golf as the time of some kind of meteorological condition) of? Please find the rule. You think as everyone becoming aware, but weather "to become cloudy", as for the case when doing golf only, you cannot observe that first it catches to the eyes. With as for the notion that where you say, if weather is cloudiness, as for this person that presumption is attached whether it is not to be a tendency which does golf.
Then will day "of rain" how probably be? When you observe to the case of day of rain, it does not do golf, very it does golf that weather is the rain it does not do it is relationship improbable. Observing to "the wind" in the case of day of rain, when you see, when doing golf, when everything (3 cases) there is no wind, doing golf everything (2 cases) there is a wind then. With as for the notion that where you say, very there is "a wind" concerning day of rain, without, it seems that produces effect on play of golf.

Enormous search space

At description above, "weather" observing when becoming cloudy and the rain, two rules were found. But, concerning other "air temperatures" and "humidity" and "the wind" how probably will be? For example when air temperature is high, when being low, how probably will be? And air temperature is high, saying, that it is low, something depending upon whether you divide at degree, the various viewpoints come emerging. Furthermore because the taxonomic model is formed, rule gathering which can explain all cases must be found. Weather as for rule concerning the case of cloudiness however you found, rule of the case of clearing up without being understood, is problem.
Well, because with example of the latest golf, furthermore it is the very such as the case of 14 small data with four items, the human does in the manual operation and others sees and perhaps, it keeps inspecting in crushing possibility, but it becomes the work where the bone breaks very, probably will be. Please try imagining the big data such as 100000 cases with 100 items. Don't you think? you do not obtain anymore in human interlude. In order to form the taxonomic model, the enormous search space must be designated as the partner, you think that it makes understand that. But, this making the computer calculate, in the same way is very difficult problem. Efficiently is the technique which forms the accurate taxonomic model is mainly proposed then heuristic (heuristic) making use of technique.

Expression of the model by the decision tree

As one of that kind of technique, the taxonomic model "the decision tree (Decision Tree)" with there is a technique which is expressed with the type which is called. Example of the decision tree in the example of the above-mentioned golf has been shown in the rough sketch, (how doing, concerning whether this kind of decision tree is drawn up, it mentions later). When it rises and falls is reversed this figure, being to become the shape like the wood it is called the "decision tree". The decision tree appears in the woman magazine well, due to "yes", [ please think the thing like the character diagnosis no ". In the figure, question is shown in the square framework of white, answers to that question with "yes," or "no" keeps tracing the branch which is shown with the arrow. If it arrives in the square framework of the color being attached, that has become answering. As for the number inside the parenthesis, the data number of cases which agrees to the condition is shown. Xtclassify command forms such taxonomic model automatically.
Once, if it can form this kind of model, first problem (weather = when "it clears up and", air temperature = "60", humidity = "90", wind = "it is not" being, there is no oak which does golf?) Also the answering for is required simply. First, it is question of the top node of the decision tree, "weather = cloudiness? When "it fits, answering" clears up "" no "(weather = and) it is to be and the branch the left is traced. As for the following question "weather = rain? "Is, but this is" no ". And the branch the left is traced, the following "humidity <=75? "Concerning question because" yes "is, the branch the right is traced. And finally "golf is done", it arrives, tomorrow, as for this person "it does golf", that it is the case that it can be estimated.

About the data

Here, it explains concerning the basic data which is used in decisive tree forming. With xtclassify, it is necessary to prepare the kind of data which is shown in the rough sketch. As data elements,resultattribute itemand explanatoryattribute item must be included.
If result attribute, you refer to example of golf, with thing of the attribute which is displayed whether the oak which does golf it was not, you must prepare by all means as one item. Xtclassify, value of result attribute can be handled to 10 types, (with example of golf, "golf it does", "golf it does not do", they were two types don't you think?).
Explanatory attribute, when you refer to example of golf, is attribute such as weather and humidity. As for explanatory attribute, in xtclassify command it is possible to maximum of 256 items to handle. And, as type of value of explanatory attribute, with xtclassify,it is possibleto handlethe typeof threetypes of numerical type, category type and pattern type, please refer to the chapter of BONSAI (concerning the treatment of pattern type). With example of golf, as for air temperature and humidity with numerical type, as for weather and the wind it is category type.
Furthermore with xtclassify, unclear value (NULL value) it is possibleasa value of explanatory attribute, to include, (as for details please refer to the chapter of treatment of NULL value).

<? Xml version= "1.0" encoding= "euc-jp"? > 
<xmltbl version= "1.1" > <header> <field no= "1" name= "weather" 
></field> <field no= "2" name= "temperature" ></field> <field no= "3" 
name= "humidity" ></field> <field no= "4" name= "wind" ></field> 
<field no= "5" name= "golf" ></field> </header> <body><! [ CDATA
[ clear 85 85 nothing the clear 80 90 possessions which are not done 
clouding 83 78 nothing which are not done rainy 70 96 nothing which 
are done rainy 68 80 nothing which are done the rainy 65 70 
possessions which are done the clouding 64 65 possessions which are 
not done clear 72 95 nothing which are done clear 69 70 nothing which 
are not done rainy 75 80 nothing which are done the clear 75 70 
possessions which are done the clouding 72 90 possessions which are 
done clouding 81 75 nothing which are done the rainy 71 80 possessions
which are done it did not do ] ] ></body> </xmltbl>
[an error occurred while processing this directive]