Mining Confident Rules Without Support Requirement
-
An open problem is to find all rules that satisfy a minimum confidence but not necessarily a minimum support.
-
With only the confidence requirement available, the widely used support-based pruning strategy does not apply.
A confidence-based pruning stratege is proposed by exploiting the universal-existential upward closure property of confidence.
The bottleneck of this approach is that the memory is often too small to hold all candidate/rules.The authors present a clustering scheme and an access method to cluster the candidates/rules on disk in the way they are requested. They designed a disk-based implementation to minimize the dominating I/O cost as in a typical database environment.
The disk-based implementation:
- hash-partition scheme
- buffer allocation
- blocking heuristics
The authors conducted experiments to evaluate the proposed algorithm, and compared their algorithm with Dense-Miner.
-
Environment: PIII500, 512M memory, NT4.0
-
Synthetic databases: attributes:9, classes:10, cases:100k (discretization: equal-width inteval partition)
-
effectiveness of pruning strategies: 2 out of every 3 candidates generated are actually confident rules.
-
effectiveness of patition scheme, blocking heuristics, buffer allocation. (100k database, 1500sec)
-
the scalability: as the database size increases, the execution time linearly increases; as the dimension increases, there is a quick growth in both the number of rules and the execution time.
-
Confidence rules can be mined using confidence-based pruning strategy by exploiting the universal-existential upward closure. This pruning method often yields a very tight search space, in the sense that out of every three candidates generated, two are actually confident.
-
By addressing several performance related issures, i.e., data partitioning and data blocking, a disk-based implementation of the pruning strategy is proposed and proved to a superior performance compared to existing methods.
-
Related work: According to the authors, they are the first to mine confidence rules with respect to i)LHS can have multiple values, ii) attributes are not restricted to certain domains. The only other algorithm that can be used for finding confidence rules without support requirement is relatively ineffective.
-
The authors claim that the significance of mining confidence rules without support requirement is that: with a high minimum support, discovered rules often are obvious and well known, and rules of low support but high confidence, which usually provides new insights, are not discovered.
-
The number of confidence rules generated by their system is too large: between 1e+6 to 2e+6, I doubt the practical value of those overwhelming rules. Without supporting requirement, the evaluation of those rules is a serious issue, for example, if there are total 9 attributes in the datase, any unique k-tuple is a confident k-rule(k=9) with confidence=100%, but can we say this kind of rules are significant? Clearly, there exists large redundance in the rules.
-
According to their algorithm, most of the confident rules have a large size, because the more attributes in a rule, the more possible it is confident. This is contrary to rules satisfy both minimum confidence and support.
| Build 11. Apr 12, 2003
Home
About this site
Literature Review
Data Mining
Machine Learning
Software Engineering
Research Notes
Bbay99.pod
Detcting change in categorical data: mining contrast sets
Ccai98mining.pod
Mining association rules with weighted items cohen.pod
Finding Interesting Associations without Support Pruning confRule.pod
Mining Confident Rules Without Support Requirement
Lliu98.pod
Integrating Classification and Association Rule Mining
Mmbre01ri.pod
Modular Model Checking of SA/RT Models Using Assoiation Rules
Wwebb00.pod
Efficient search for association rules
Aagrawal93.pod
Mining Association Rules between Sets of Items in Large Databases agrawal94.pod
Fast algorithm for mining association rules
Ggoebel99.pod
A Survey of Data Mining and Knowledge Discovery Software Tools mendonca99.pod
Mining Software Engineering Data: A Survey |