Very Simple Classification Rules Perform Well On Most Commonly Used Datasets

(File Last Modified Wed, May 29, 2002.)

Review: Very Simple Classification Rules Perform Well On Most Commonly Used Datasets

Notes
Experiments
Conclusion
Analysis
Comments

Review: Very Simple Classification Rules Perform Well On Most Commonly Used Datasets

Notes

1-Rule: rules that classify an object on the basis of a single attribute.
1R: the learning sysytem, whose input is a set of training examples and whose output is a 1-rule.

Experiments

is run on 16 commaonly used datasets from UCI, and the resuts are compared to C4.
results: 1R's rules are only a few percentage points less accurate, on most of the datasets, than C4's pruned decision tree.

1R*:

is defined as an upper bound of predictive accuracy that 1R system can achieve after possible improvements to 1R's criterion for selecting rules.
results: on almost all the datasets studied, 1R* turns out to be very similar to the accuracy of C4.

1Rw: highest accuracy of the 1-Rule produced when the whole dataset is used by 1R for both training and testing.

experiments:
1. compare the accuracy on the 16 datasets with 1Rw and C4 use a statistical test( a two-tailed t-test)
2. evaluate 1Rw as a predictor of the accuracy of other ML systems by comparing it on each dataset to the median of the accuracies for that dataset reproted in the literature.
results: 1Rw is a good predictor of C4's accuracy as well as other ML system.

Conclusion

simple-rule learning systems are often a viable alternative to systems that learn more complex rules. If a complex rule is induced, its additional complexity must be justified by its being correspondingly more accurate than a simple rule.
1R can be used to predict the accuracy of the rules produced by more sophisticated machine learning systems. This prediction can be used as a benchmark accuracy.

Analysis

Why simple machine learner performs mostly as well as complex learner like C4?

C4 doesn't miss opportunities to exploit additional complexity in order to improve its accuracy: C4's pruned trees were the same accuracy as its unpruned ones.
It may simply be a fact that on those particular dataset 1-rules are almost as accurate as more complex rules.
In some datasets classes and the values of some attributes are almost in 1-1 corespondence.

Most of these datasets are typical of the data available in a commonly occurring class of real classification problems.

the datasets are drawn from a real-life domain as opposed to having been constructed artificially.
the particular examples in the dataset and the attributes have not been specially engineered by the ml community to make them easy.

The ``simplicity first'' methodology is a promising alternative to the existing methodology, whose main premise is that a learning system should search in very large hypothesis spaces containing very complex hypotheses.

Comments

This paper is an experimental report. It compares 1R and C4 through a detailed description of 3 sets of experiments. For each set of experiment, it gives results followed by analysis of the results and implications from the results. Analysis on the result is very concrete, which includes the explanation of exceptions.

The fact that ``simple learner works almost as well as complex ones'' seems a support of your theory: ``there always exists key variables(simple rules) that controll the world''. But the author has a different view, which I think is a point:

There is no theoretical proof of this phenomenon, it may simply be a fact that it is true on particular datasets.
Those particular datasets turn out to be ``representative'' of the datasets that actually arise in practice.
Although it is ture that some real world problems do not have simple solutions, it doesn't imply all real problems are hard.
So the jusfication of simple learners is: they are a desirable solution for a kind of real world problems. Furthermore, they generate insight of tradeoffs between accuracy and complexity.

The author claims: systems designed using the ``simplicity first'' methodology are guaranteed to produce rules that are near-optimal with respect to simplicity. If the accuracy of the rule is unsatisfatory, then there does not exist a satisfactofy simple rule. I am wondering: If tar2 fails on some domains, does it indicate there are no simple controllers in those domains?

Build 11. Apr 12, 2003

Home

About this site

Literature Review

Data Mining

Machine Learning

Software Engineering

Research Notes

holte93.pod
Very Simple Classification Rules Perform Well On Most Commonly Used Datasets

jj99.pod
An Architecture for Exploring Large Design Spaces

mair00.pod
An Investigation of Machine Learning Based Prediction Systems

mbre01bo.pod
Using Machine Learning to Predict Projct Effort: Empirical Case Studies in Data-Starved Domains

quinlan86.pod
Induction of Decision Trees

shawlik91.pod
Symbolic and Neural Learning Algorithms: An Experimental Comparison

KARDIO.pod
Qualitative modelling and learning in KARDIO