Integrating Classification and Association Rule Mining

(File Last Modified Wed, May 29, 2002.)


Review: Integrating Classification and Association Rule Mining

Problem Addressed

integrate both classification rule mining and association rule mining to build a classifier that classifies efficiently with increased accuracy.

Approach Proposed

  • discretizing continuous attributes.
  • generating all class association rules.
  • building a classifier based on the above rules.

Algorithm to generate a complete set of CARs(Class Association Rule)

  • CAR: X-->Y, where X is a subset of items(treatments) and Y is a class
  • confidenc c: |cases satisfy X are labeled Y| / |cases satisfy X| = c%
  • support s: |cases satisfy X are labeled Y| / |total cases| = s%
  • find ruleitems(<itemset, class label>) that are both frequent and accurate by making multiple passes over the data.
            frequent:ruleitems that have support above minsup.
            accurate:ruleitems that have confidence above minconf.
  • rule generated is pruned useing pessimistic error rate based pruning method in C4.5

Algorithm to build a classifier

  • to choose a set of high precedence(greater support,confidence) rules in CARs to cover the training data.
  • algorithm satisfies two conditions
           each training case is covered by the rule with the highest recedence.
           every rule chosen correctly classifies at least one remaining training case.
  • discard those rules chosen that don't improve the accuracy.
  • an improved algorithm is developed to complete the task by making only slightly more than one pass over the remaining data for each rule.

Experiment results

  • this classfier is run on 26 datasets from UCI. It outperforms C4.5 on 16 datasets, and the average error rate on total 26 is lower than that of C4.5
  • runtime is seconds when all data is kept in memory.

Conclusion

Insights

  • this paper mentioned its use of a entropy method to discretize continuous attrbutes, which I plan to have a look at.
  • in tar2, skew is similar to support, which can be used to optimize the combination process.
  • question: the goal of tar2? ( what kinds of result we want to mine from the data? classifications? discreminations? associations?)

Build 11. Apr 12, 2003


  *  Home

  *  About this site

Literature Review
  *  Data Mining

  *  Machine Learning

  *  Software Engineering

  *  Research Notes



B

bay99.pod
Detcting change in categorical data: mining contrast sets


C

cai98mining.pod
Mining association rules with weighted items

cohen.pod
Finding Interesting Associations without Support Pruning

confRule.pod
Mining Confident Rules Without Support Requirement


L

liu98.pod
Integrating Classification and Association Rule Mining


M

mbre01ri.pod
Modular Model Checking of SA/RT Models Using Assoiation Rules


W

webb00.pod
Efficient search for association rules


A

agrawal93.pod
Mining Association Rules between Sets of Items in Large Databases

agrawal94.pod
Fast algorithm for mining association rules


G

goebel99.pod
A Survey of Data Mining and Knowledge Discovery Software Tools

mendonca99.pod
Mining Software Engineering Data: A Survey