Induction of Decision Trees

(File Last Modified Wed, May 29, 2002.)

Review: Induction of Decision Trees

Learning: the acquisition of structured knowledge in the form of concepts, discrimination nets, or production rules. It enable a system to do the same task more efficiently the next time.
Machine learning
- Understand and improve efficiency of human learning -- Computer-aided instruction
- Discover new things or structure that is unknown to humans --- Data mining
- Fill in skeletal or incomplete specifications about a domain --- knowledge-based expert system
Inductive learning: use specific examples to reach general conclusions. To develop a classfication rule that can determine the class of any object from its values of attributes.

Decision trees: is a simple inductive learning structure, search for patterns in the given examples and must be able to examine all of them during learning. Given an instance of an object, which is specified by a set of attributes, the tree returns a P/N classfication about that instance
Example - ID3, C4 (TDIDT: Top-Down Induction of Decision Trees)
Bias:A preference for one hypothesis over another.
Ockham's razor: The simplest one that is consistent with all observations is the best. A simpler tree is more likely to capture the rule in the problem, thus can classify correctly more future objects. Great complexity makes a decision tree an explanation rather than a learning system of the training set.
Criteria for evaluating the performance -- predictive accuracy

Algorithm

Recursively/iteratively select the ``best attribute'' to use at the current node in the tree
Choose the attribute A with the max gain as the root
Divides the set according to the infomation gain value of A.
Repeat the procedure until the subset includes objects of one class, thus it reaches the leaf.

Entropy based criterion