Using Machine Learning to Predict Projct Effort: Empirical Case Studies in Data-Starved Domains

(File Last Modified Wed, May 29, 2002.)

Review: Using Machine Learning to Predict Projct Effort: Empirical Case Studies in Data-Starved Domains

Problem Addressed
Approach Proposed
Validation
Comments
Insight

Review: Using Machine Learning to Predict Projct Effort: Empirical Case Studies in Data-Starved Domains

Problem Addressed

The author claims that data scarce is the main reason that impedes machine learning application on software engineering. Consequently, it raises the question:`` How to generate sufficient amounts of data from relatively few projects for project assessment? ''

Approach Proposed

The approach proposed is to assess projects from a bottom-up perspective. This approach uses estimates gathered from products in predicting project effort.

Validation

The author conducts 2 machine learning experiments using a typical neural network with software cost estimation data from two separate organizations.

Comments

The author did touch the point that the lack of data is an important problem in applying machine learning to certain domains. But the generality of the approach he proposed is not supported. The author claims that it is possible to use estimates gatherd from products to accurately estimate programming effort. But the relevance between product and programming effort is not sufficiently discussed hence hard to be convincing. Also, the author never mentioned the generality to apply his ``bottom-up perspective'' in data-starved domains other than his case study, which he claims is the solution to the problem.

The author present a case study in which he uses a neural net to get project effort estimation from product data. But one basic claim (in the introduction section), that decision-tree induction cannot handle this task, is never supported experimentally. Also, the author concludes that his approach outperforms the COCOMO results without enough comparison and validation.

The author provides many references to related research before addressing his own approach. The first 3 pages look like a summary of his literature review rather than a problem description. The section ``Data-Rich Domains'' has little to do with his thesis, and the explanation of ``why data is scarce'' in section 1 is more detailed than the analysis of his experiments.

Two experiments were conducted using a typical neural net without any novel adaption in the algorithm or training method. There is no example of input and output data, and the statistical assessment of the results didn't convey much infomation to the reader. Further, the experimental results are given without necessary explanations of the actual meaning of each column. It is hard to see the relation between those magic numbers and the author's conclusion.

The abstract is too long, which contains repeated infomation in the introduction section. Section 1 and 2 could be more succinct, generally, the content could be presented in a 4-page paper.

Insight

Reading this paper, it seems to me that the author tried to relate his own work to a farfetched thesis. His paper reviews some related work in the literature but provides little insight.

When writing a paper, it is important to say something that the readers of a particular group want to hear, but the most important is that the author should present a real insightful research. It is the research work itself which gives a paper academic value.

A good 8-page paper may only reflect less than 10% work of the researcher. A researcher may need to write dozens or even hundreds of papers to become successful. Oh, is this the life of academic?

Build 11. Apr 12, 2003

Home

About this site

Literature Review

Data Mining

Machine Learning

Software Engineering

Research Notes

holte93.pod
Very Simple Classification Rules Perform Well On Most Commonly Used Datasets

jj99.pod
An Architecture for Exploring Large Design Spaces

mair00.pod
An Investigation of Machine Learning Based Prediction Systems

mbre01bo.pod
Using Machine Learning to Predict Projct Effort: Empirical Case Studies in Data-Starved Domains

quinlan86.pod
Induction of Decision Trees

shawlik91.pod
Symbolic and Neural Learning Algorithms: An Experimental Comparison

KARDIO.pod
Qualitative modelling and learning in KARDIO