Symbolic and Neural Learning Algorithms: An Experimental Comparison
-
The division between symbolic and neural network approaches to artificial intelligence is particularly evident within machine learning.
-
Despite the fact that symbolic and connectionist learning systems frequently address the same general problem, very little is known regarding their comparative strengths and weaknesses.
-
This paper presents the results of several experiments comparing the performance of the ID3 symbolic learning algorithm with both the perceptron and backpropagation neural algorithms.
-
ID3, perceptron and backpropagation algorithms are chosen as representative algorithms.
-
5 different data sets are used in 4 experiments for each algorithm, among which 4 have been previously used to test symbolic learning systems and 1 has been used to test backpropagation.
Experiment1: compares the learning times and accuracies of the three algorithms.
-
For the most part all three learning systems are similar with respect to accuracy on novel examples.
-
Perceptron performs poorly when data sets are not linearly separable.
-
ID3 performs poorly on the data set which contains many numerically-valued features.
-
ID3 and perceptron train much faster than backpropagation.
Experiment2: relative performance as a function of the amount of training data.
-
For small amounts of training data backpropagation is perhaps a better choice.
-
ID3's tendency to perform worse on small training sets may be related to the problem of small disjuncts: the simplest conjunction that covers a small set of examples is frequently an overgeneralization. ID3(and other symbolic induction systems) will perform better on datasets where the number of classes is small(hence each class will contain more examples, avoiding the risk of small disjuncts).
Experiment3: investigates the performance of the learning algorithms in the presence of three types of imperfect data.
-
Radom noise: backpropagation appears to handle random noise slightly better than ID3 does. Explanation: backpropagation makes its decisions by simultaneously weighing all the input features, while ID3 sequentially considers the values of input features as it traverses the trees. Hence, a single noise early in the tree could significantly impact classification.
-
Missing feature values: backpropagation handles missing feature values better than ID3 and perceptron. Explanation: backpropagation naturally supports the representation of partial evidence for a feature's value.
-
Completely dropped features: both ID3 and backpropagation do surprisingly well as the number of features is reduced.
Experiment4: investigates the value of distributed output encodings.
-
The use of a distributed encoding of the output substantially improves the performance of backpropagation.
-
ID3 is also able to use the encoding but its performance with the two are roughly equivalent.
-
Perceptron has degrades in the performance.
-
The difference between symbolic and connectionist systems lies in their inherent inductive biases, which determine which of the many logical rules consistent with a set of data is actually chosen as the concept definition.
-
Regardless of the reason, data for a number of real problems seems to consist of linearly separable categories which make simple and efficient learning algorithm Perceptron promising.
-
Although backpropagation performs about as well or better than the other systems at classification, it consistently takes a lot time to train.
-
Scalability(complexity): time for ID3 grows linearly with the number of examples; neural nets can be NP-complete.
-
Other issues include the configurability, human interpretability and robustness in the presence of component failure.
This paper focuses on evaluating algorithms based on training time and predictive accuracy. The experiments indicate that the inductive biases of symbolic and neural net systems are equally suitable for many real world problems. Detailed conclusion are presented with the experiment results.
ID3: The information-gain criterion that determines the splitting attribute acts as a hill-climbing heuristic, which tends to minimize the size of the resulting decision tree.
Perceptron:
-
It is incapable of learning concepts that are not linearly separable.
-
The procedure performs gradient descent(i.e., hill climbing) in weight space in an attempt to minimize the sum of the squares of the errors across all of the training examples.
-
Perceptron cycling theorem: once a set of weights repeats, it indicates that the data is not linearly separable.
Backpropagation:
-
gradient descent, but is capable of doing this for multi-layer networks with hidden units.
-
It may get stuck at a local minima and therefore is not guaranteed to converge to 100% correctness on the training data.
-
It may get trapped in an oscillation; however, the momentum term helps prevent oscillations.
The paper is very well written, it perfectly exhibits the thorough and deep research underlying it. All reasons for choosing the particular techniques, experiment settings, methodologies used are discussed before presenting, which keep the research convincing to the reader and convey much information.
Each experiment results are shown with figures, followed by discussion.The authors did a lot of work analyzing the results and tried to provide resonable explanations to their conclusion. For issues without definite explanation, they provide references and relavant idears of other researchers which generate much insight.
Comparison was done in a systematical manner, topics are covered in different experiment. I can easily see that they carefully chose different method/technique for different task, which reflect their serious research attitude. I really respect it.
This paper was written in early 90's (1991), I believe both symbolic and neural learning algorithms are significantly improved and developed during recent years(e.g., ID3 is now C5, backpropagation has many variants), although this paper provided thorough and detailed comparason as well as analysis between the two learning methodologies, I wonder whether some of the results still hold for the modern versions. Nevertheless, this is a very good paper, both for its research value and writing style.
I have an impression that the authors have a slight preference for the ID3 against backpropagation.
In the experiment of ``Completely dropped features'' there is an interesting phenomenon: randomly dropping half the features only slightly impairs performance of both ID3 and backpropagation. This indicates the apparent redundancies in several of the domains. I think this is a evidence of the ``funnel theory'' in the practice.
| Build 11. Apr 12, 2003
Home
About this site
Literature Review
Data Mining
Machine Learning
Software Engineering
Research Notes
Hholte93.pod
Very Simple Classification Rules Perform Well On Most Commonly Used Datasets
Jjj99.pod
An Architecture for Exploring Large Design Spaces
Mmair00.pod
An Investigation of Machine Learning Based Prediction Systems mbre01bo.pod
Using Machine Learning to Predict Projct Effort: Empirical Case Studies in Data-Starved Domains
Qquinlan86.pod
Induction of Decision Trees
Sshawlik91.pod
Symbolic and Neural Learning Algorithms: An Experimental Comparison
KKARDIO.pod
Qualitative modelling and learning in KARDIO |