Associate Rule Mining
FP – Growth Algorithm
FP-growth (frequent pattern growth): Mining Frequent Itemsets without Candidate Generation
Example:
Step 1: Count Item Frequencies : We first count the frequency of each item in the entire dataset.
Item | Frequency | Priority |
I1 | 6 | 2 |
I2 | 7 | 1 |
I3 | 6 | 3 |
I4 | 2 | 4 |
I5 | 2 | 5 |
Step 2: Sort Items in Transactions Based on Frequency: Now we reorder each transaction in descending order of frequency. If two items have the same frequency, you can keep their original order.
TID | Items |
T1 | I2, I1, I5 |
T2 | I2, I4 |
T3 | I2, I3 |
T4 | I2, I1, I4 |
T5 | I1, I3 |
T6 | I2, I3 |
T7 | I1, I3 |
T8 | I2, I1, I3, I5 |
T9 | I2, I1, I3 |
Item | Frequency | Priority |
I1 | 6 | 2 |
I2 | 7 | 1 |
I3 | 6 | 3 |
I4 | 2 | 4 |
I5 | 2 | 5 |
TID | Items |
T1 | I2, I1, I5 |
T2 | I2, I4 |
T3 | I2, I3 |
T4 | I2, I1, I4 |
T5 | I1, I3 |
T6 | I2, I3 |
T7 | I1, I3 |
T8 | I2, I1, I3, I5 |
T9 | I2, I1, I3 |
TID | Items |
T1 | I2, I1, I5 |
T2 | I2, I4 |
T3 | I2, I3 |
T4 | I2, I1, I4 |
T5 | I1, I3 |
T6 | I2, I3 |
T7 | I1, I3 |
T8 | I2, I1, I3, I5 |
T9 | I2, I1, I3 |
NULL
I2:1
I1:1
I5:1
NULL
I2:2
I1:1
I5:1
I4:1
NULL
I2:3
I1:1
I5:1
I4:1
I3:1
NULL
I2:4
I1:2
I5:1
I4:1
I3:1
I4:1
NULL
I2:4
I1:2
I5:1
I4:1
I3:1
I1:1
I3:1
NULL
I2:5
I1:2
I5:1
I4:1
I3:2
I1:1
I3:1
NULL
I2:6
I1:3
I5:1
I4:1
I3:2
I1:2
I3:2
I3:1
I5:1
NULL
I2:7
I1:4
I5:1
I4:1
I3:2
I1:2
I3:2
I3:2
I5:1
Step 3: Construct the FP-Tree: We start with a root node and insert transactions one by one.
TID | Items |
T1 | I2, I1, I5 |
T2 | I2, I4 |
T3 | I2, I3 |
T4 | I2, I1, I4 |
T5 | I1, I3 |
T6 | I2, I3 |
T7 | I1, I3 |
T8 | I2, I1, I3, I5 |
T9 | I2, I1, I3 |
Step 4: Mine the FP-Tree for Frequent Patterns
Mining Various Kinds of Association Rules:Mining Multilevel Association Rules
Association rules generated from mining data at multiple levels of abstraction are called multiple-level or multilevel association rules.
Using uniform minimum support for all levels (referred to as uniform support):
Using item or group-based minimum support (referred to as group-based support):
Because users or experts often have insight as to which groups are more important than others, it is sometimes more desirable to set up user-specific, item, or group based minimal support thresholds when mining multilevel rules.
For example, a user could set up the minimum support thresholds based on product price, or on items of interest, such as by setting particularly low support thresholds for laptop computers and flash drives in order to pay particular attention to the association patterns containing items in these categories.
Mining Multidimensional Association Rules from Relational Databases and Data Warehouses
Example:
buys(X, “digital camera”) ⇒ buys(X, “HP printer”)
Example:
age(X, “20...29”) ∧ occupation(X, “student”) ⇒ buys (X,“laptop”)
Example:
age(X, “20...29”)∧buys(X, “laptop”)⇒buys(X, “HP printer”)
Mining Multidimensional Association Rules Using Static Discretization of Quantitative Attributes
Graph Pattern Mining
What is a graph: A graph is a set of vertices and edges, having some labels.
Types of graphs: connected and disconnected
A connected graph in this context is a graph where it is possible to go from any node to any node cities by following the roads. If a graph is not connected, it is said to be a disconnected graph.
Types of graphs: directed and undirected
a graph where vertices and edges. Some edges can be travelled in both directions(directed graph) while some edges may be travelled in only a single direction is known as undirected graph.
Frequent Subgraph Mining
A frequent subgraph is a subgraph that appears in at least minsup graphs from a graph database.
set the minsup parameter to 3
subgraph is frequent and is said to have a support (a frequency) of 3 since it appears in three of the input graphs.
the minsup parameter is generally set by trial and error. If this parameter is set too high, few subgraphs will be found, while if it is set too low, hundred or millions of subgraphs may be found, depending on the input database.
Mining frequent subgraphs in a single graph
These subgraphs are said to be frequent because they appear at least twice in the input graph.
Algorithms Used: Gspan, Closed Graph
Applications
Sequential Pattern Mining
Ex: Consider the following sequence database, representing the purchases made by customers in a retail store.
CONSTRAINT BASED ASSOCIATION RULES:
User Flexibility: provides constraints on what to be mined.
System Optimization: explores constraints to help efficient mining.
The constraints can include the following:
Knowledge type constraints: These specify the type of knowledge to be mined, such as association or correlation.
Data constraints: These specify the set of task-relevant data.
Dimension/level constraints: These specify the desired dimensions (or attributes) of the data, or levels of the concept hierarchies, to be used in mining.
Interestingness constraints: These specify thresholds on statistical measures of rule interestingness, such as support, confidence, and correlation.
Rule constraints: These specify the form of rules to be mined. Such constraints may be expressed as rule templates, as the maximum or minimum number of predicates that can occur in the rule antecedent or consequent, or as relationships among attributes, attribute values, and/or aggregates. The constraints can be specified using a high-level declarative data mining query language and user interface.