1 of 22

Finding Frequent Patterns and Association Rules

  • Using WEKA – FP-Growth Algorithm
  • Course: Databases and Data Mining
  • Instructor: Jamolbek Mattiev

2 of 22

Learning Objectives

  • • Understand FP-Growth algorithm
  • • Compare FP-Growth with Apriori
  • • Generate association rules in WEKA
  • • Interpret support and confidence

3 of 22

Frequent Pattern Mining

  • • Discover frequent itemsets
  • • Analyze transactional data
  • • Support-based pattern discovery

4 of 22

Limitations of Apriori

  • • Multiple database scans
  • • Large candidate generation
  • • High computational cost

5 of 22

FP-Growth Overview

  • • FP = Frequent Pattern
  • • No candidate generation
  • • Uses FP-Tree structure
  • • More efficient than Apriori

6 of 22

FP-Tree Structure

  • • Compact prefix-tree
  • • Stores frequency information
  • • Built with two database scans

7 of 22

FP-Growth Steps

  • 1. Scan dataset and compute support
  • 2. Build FP-tree
  • 3. Generate conditional pattern bases
  • 4. Extract frequent patterns

8 of 22

Support and Confidence

  • Support(A) = frequency(A) / total transactions
  • Confidence(A→B) = Support(A∪B)/Support(A)

9 of 22

Advantages of FP-Growth

  • • Faster than Apriori
  • • No candidate generation
  • • Efficient for large datasets

10 of 22

WEKA: Step 1

  • • Open WEKA Explorer
  • • Load transactional dataset (.arff)
  • • Go to Associate tab

11 of 22

WEKA: Step 2

  • • Click Choose
  • • Select FPGrowth
  • • Set minimum support & confidence

12 of 22

WEKA: Step 3

  • • Configure number of rules
  • • Adjust thresholds
  • • Click Start

13 of 22

Interpreting WEKA Output

  • • Frequent itemsets
  • • Association rules
  • • Support and confidence values

14 of 22

Effect of Support Threshold

  • High support → fewer patterns
  • Low support → many patterns
  • Balance quality vs quantity

15 of 22

Performance Comparison

  • FP-Growth faster than Apriori
  • Better scalability
  • Suitable for large transactional data

16 of 22

Applications

  • • Market basket analysis
  • • E-commerce recommendation
  • • Web clickstream mining
  • • Healthcare pattern discovery

17 of 22

Challenges

  • • Memory usage for large trees
  • • Parameter tuning
  • • Interpreting many rules

18 of 22

Experimental Design in WEKA

  • • Compare Apriori and FP-Growth
  • • Use same support/confidence
  • • Measure runtime
  • • Compare rule quality

19 of 22

Discussion Questions

  • • Why is FP-Growth faster?
  • • When use Apriori instead?
  • • How to choose thresholds?

20 of 22

Summary

  • • FP-Growth avoids candidate generation
  • • Uses FP-tree for compression
  • • Efficient frequent pattern mining
  • • WEKA supports easy experimentation

21 of 22

Support Threshold Impact (Example)

22 of 22

Apriori vs FP-Growth Runtime (Example)