Detcting change in categorical data: mining contrast sets
-
``How do several contrasting groups differ?''
-
A task in data analysis is understanding the differences between several contrasting groups.
determine significance: whether the cset support is independent of group membership
use contingency tables, chi-square test.
controll type I error: different threshold for different level to determin significance.
prune: prune a node on the search tree when all its specializations can never be a significant and large cset.
-
Minimum Deviation Size (promising)
-
Expected Cell Frequencies (skew)
-
X^2 Bounds
Finding surprising csets: when form the conjunction of N attribute-values, if the actual proportion of the conjunction is similar to the the proportion when those N attribute-values are independent, then the conjunction is not surprising.
Using the approach(algorithm) described in the paper enables us to detect differences across contrasting groups, which can not be satisfactorily solved by other mining algorithm such as association mining.
This paper defines a problem and describes an algorithm to solve it. It discusses technical details of the algorithm and evaluates it on 2 datasets. Results of the experiments are presented but no validation is given. In fact, there is no obvious method for validation.
The authors compare their work to others' related research:
-
Compared to association rule, their approach is a more understandable and efficient way to solve that kind of problem.
-
Compared to other reseach that tried to find changes, their work is fundamentally different as they have different goals and results.
The paper doesn't mention runtime; it only lightly mentions the discretization method: continuous attributes are discretized into equal sized intervals.
It's experimental output is given as a list with some technical figures. It is hard to tell which set is more important. The number of the sets is more than 100.
-
Contrast sets are in fact what we call treatments. It's the only paper by now I've seen that aims at mining treatments.
-
There is no ordering in the classes. Each contrast set is supposed to represent differences between 2 classes. i.e there is no preference for any class, a contrast set only separate the classes. This is the primary difference between contrast sets and tar2's treatments.
-
The algorithm automatically finds contrast sets in all levels, and use prune methods to decide if a deeper level is necessary for a node. We can do this on tar2: automatically try nchanges from 1 to N, terminate the process under certain condition.
-
Prune methods we can use now is promising and skew, both need predifine. (use skew to terminate the combination process--algorithm optimization)
| Build 11. Apr 12, 2003
Home
About this site
Literature Review
Data Mining
Machine Learning
Software Engineering
Research Notes
Bbay99.pod
Detcting change in categorical data: mining contrast sets
Ccai98mining.pod
Mining association rules with weighted items cohen.pod
Finding Interesting Associations without Support Pruning confRule.pod
Mining Confident Rules Without Support Requirement
Lliu98.pod
Integrating Classification and Association Rule Mining
Mmbre01ri.pod
Modular Model Checking of SA/RT Models Using Assoiation Rules
Wwebb00.pod
Efficient search for association rules
Aagrawal93.pod
Mining Association Rules between Sets of Items in Large Databases agrawal94.pod
Fast algorithm for mining association rules
Ggoebel99.pod
A Survey of Data Mining and Knowledge Discovery Software Tools mendonca99.pod
Mining Software Engineering Data: A Survey |