[an error occurred while processing this directive]

Precision of the model is presumed


You think that it could have understanding, in former explanation, the taxonomic model by the decision tree some kind of ones. Next it will keep advancing story concerning the precision of the model. With formation of the taxonomic model, theprecision(Accuracy) becomes very important. Constructing the taxonomic model, if it fits the unknown data, "it comes off" always and with is troubled.
The precision in the taxonomic model, when all case (with example of golf 14 cases) fitting in the model, being something which is displayed in the case of some case is just classified, ratio it does. Because with example of golf, the case of 14 cases everything is classified just, precision becomes 100%.
But, "the answer", namely the oak which does golf there was no data of these 14 cases, but because it is the data which understands, it is natural to be able to draw up the model which can classify only those data accurately. Whether or not problem the model which was made there confronting also the unknown data, is accurate, is.
As for this, it is possible to compare to preparatory study for examination. If problem of the reference "while looking at the answer," if it keeps solving, it is the problem which has been recorded before the reference, with any kind of problem 100% it reaches the point where you can correctly interpret, probably will be. Therefore with saying, the test of the production (as for the answer you see and there is no Çs! ! ) With you can obtain the good result, with it does not limit.
Then, the necessity presuming the precision of the model for the unknown data with a some method comes arising. If you refer to preparatory study for examination, the standard that "as for you as for the probability which it can pass to 00 universities it is 60%", becomes necessary. Presumptionof precision of the model(Accuracy estimation) there are the next three methods largely.

A) Alternative presumption method (Resubstitution estimate)

If as for this with very simple thought, you refer to the above-mentioned example, the decision tree (d) with the case which constructed that model to the model which is displayed, (it makes N case) for the second time to apply, it is the method probably designating as estimator of model precision, it sets, about which it is classified (makes t case accurately) that way. The precision A due to alternative presumption method (d) it is displayed with the following formula.

But because this uses the data which was used in construction of the model that way for precision presumption, you can say that very optimistic presumption method (it reaches high precision) is, probably will be.

B) Test sample method (Test sample estimate)

Then, the method of preparing the data in order to construct the model and the data in order to presume the precision beforehand separately is test sample method. First divides the data of the basis, to random (as for ratio of division, the test data 1/3, it is general in two data which are called the training data and the test data) to designate the training data as 2/3. In addition, also both data, it is desired that it is the kind of data which represents the original data. For example, the case which "does golf" is not completely included in ÇÄÇèÆãÇÇÇïǬÇÄÇÃÆãÇ», only the case whose air temperature is high being included, it is problem. Then it can use the technique whichusually "the stratification(Stratification) " with is called. In order the occasion where sampling is done from the original data, distribution of value of each attribute, to become similar to the distribution of the original data, being to sample, it does.
As for formation of the model it draws up the training data (the answer equipped data) of making use, fits the test data in that model. And the taxonomic precision in the test data, is designated as precision of that model, (rough sketch reference). If you refer to preparatory study for examination, it hits to verifying without looking at the answer, solving, seeing, about which you correctly interpret several problems of the reference.
When it designates the entire number of casesof the test dataas N ts, among those, fits in the model and it designates the number of cases which really is correct answeras t ts, the presumption precision A ts due to test samplemethod(d) it is displayed with the next formula.

C) Intersection verification method (n-flod cross-validation estimate)

Test sample method, when sufficiently there is a data quantity, does not become problem, but when the data quantity is small, the possibility big error occurring in presumption precision with the method of choosing the test data, becomes high. In that kind of case, the intersection verification method which is explained here is used. With this method, it can use value 10 - 20 the original data n is divided into block first (usually as n). At that time, the number of cases which is allotted to each block that tries becomes same. In addition also each block is desired, being stratified.
And first, the first block the test data and other blocks as a training data, construction of the model and calculation of precision are done. It designates second blocks as the test data designates other blocks as the training data, keeps doing model construction next. This kind of procedure the n time will be repeated, being something which average of the precision which was calculated with each time, will be designated as presumption precision of the model, it does. You think as becoming aware, but as for trial of each time, it is found that it is the test sample method ratio of the test data due to 1/n.
When intersection verification method is used, all cases, as for one time are chosen happen to see still and as the test data, come to the point of constructing the model making use of the training data of number of cases of n - 1 times that the upper original data, the little data being, presumption error decreases.
When the decision tree which was constructed to the n time eye with the figure underis designatedas d n, the precisionis doneA ts(d n) with, it is given with the presumption precision A cv(d) following system due to intersection official approval.

[an error occurred while processing this directive]