JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 32

Machine Learning II:

Association Rules and Recommendation

Patrick Hall

Visiting Faculty, Department of Decision Sciences

George Washington University

2 of 32

Lecture 11 Agenda

Association Rules

Rule Evaluation Metrics

Market Basket Analysis
Interlude: Supervised Rule-based Models
Recommendations

Netflix Case

Readings

3 of 32

Where are we in the modeling lifecycle?

Data Collection �& ETL

Feature Selection & Engineering

Supervised�Learning

Unsupervised�Learning

Deployment

Cost Intensive

Revenue

Generating

Assessment & Validation

4 of 32

Association Rules

Overview

Rule Evaluation Metrics

5 of 32

Overview: Association Rules

Association analysis is useful for discovering interesting relationship(s) hidden in large data sets:

Undiscovered relationships can be represented in the form of sets of items known as frequent itemsets or association rules.

Association rules suggest relationships in transactions that can be used in many business practices.
Two key issues that must be addressed in association analysis:

Discovering patterns from large transactional datasets can be computationally expensive.
Some of the discovered patterns may be spurious (random).

Source: Introduction to Data Mining

6 of 32

Mining Association Rules

Two-step approach:

Frequent itemset generation: Generate all itemsets whose support ≥ minimum support hyperparameter.�
Rule generation: Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset.

Frequent itemset generation is computationally expensive!

Source: Introduction to Data Mining

7 of 32

Association Rules Mining

Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction.

Source: Introduction to Data Mining

Market-Basket Transactions

Examples of Association Rules:

{Diaper} → {Beer}
{Milk, Bread} → {Eggs, Coke}
{Beer, Bread} → {Milk}

Implication means co-occurrence, not causality!

8 of 32

Terms and Definitions

Itemset:

A collection of one or more items

Example: {Milk, Bread, Diaper}

k-itemset

An itemset that contains k items

Frequent itemset:

An itemset whose support is greater than or equal to a minimum support hyperparameter

Support count (σ):

Frequency of occurrence of an itemset
E.g. , σ({Milk, Bread, Diaper}) = 2

Support (s):

Fraction of transactions that contain an itemset
E.g., s({Milk, Bread, Diaper}) = 2/5

Source: Introduction to Data Mining

9 of 32

Rule Evaluation Metrics

Association Rules:

An expression of the form X → Y, where X and Y are itemsets.
Example: {Milk, Diaper} → {Beer}

Rule Evaluation Metrics:

Support (s): fraction of transactions that contain both X and Y.
Confidence (c): measures how often items in Y appear in transactions that contain X.
Expected confidence (E[c]): number itemsets that contain Y.
Lift (l): measure of correlation between itemsets: confidence/expected confidence.

Source: Introduction to Data Mining

Example:

10 of 32

Association Rules

Example of Rules:

{Milk, Diaper} → {Beer} (s=0.4, c=0.67)�{Milk, Beer} → {Diaper} (s=0.4, c=1.0)

{Diaper, Beer} → {Milk} (s=0.4, c=0.67)

{Beer} → {Milk, Diaper} (s=0.4, c=0.67) �{Diaper} → {Milk, Beer} (s=0.4, c=0.5)

{Milk} → {Diaper, Beer} (s=0.4, c=0.5)

Observations:

All the rules are binary partitions of the same itemset: {Milk, Diaper, Beer}.
Rules originating from the same itemset have identical support but can have different confidence.
Thus, we may decouple the support and confidence requirements.

Source: Introduction to Data Mining

11 of 32

Probability that the two itemsets in a rule occur together.

Itemset: {KLYT6, IA45Y}

Rule: {KLYT6} → {IA45Y}

KLYT6 and IA45Y both occur together in ⅘ transactions

(KLYT6 and IA45Y) / |T|

s = 0.8

Support is symmetric:

s({KLYT6} → {IA45Y}) = s({IA45Y} → {KLYT6})

Rule Evaluation Metrics: Support


	Transaction ID	Product ID
	11231	IA45Y
	11231	DDD7M
	11231	BDR5U0
	11231	KLYT6
	11232	TTR45H
	11232	HJL9B
	11232	DDD7M
	11232	IA45Y
	11232	BDR5U0
	11232	KLYT6
	11233	KLYT6
	11233	IA45Y
	11233	DDD7M
	11234	KLYT6
	11234	BDR5U0
	11235	DDD7M
	11235	KLYT6
	11235	BDR5U0
	11235	IA45Y

12 of 32

Conditional probability of a transaction containing itemset Y given that it contains itemset X.

Rule: {KLYT6} → {IA45Y}

⅘ of the transactions that contains KLYT6 (X)

also contains IA45Y (Y)

(KLYT6 and IA45Y) / {KLYT6}

c = 0.8

Rule Evaluation Metrics: Confidence


	Transaction ID	Product ID
	11231	IA45Y
	11231	DDD7M
	11231	BDR5U0
	11231	KLYT6
	11232	TTR45H
	11232	HJL9B
	11232	DDD7M
	11232	IA45Y
	11232	BDR5U0
	11232	KLYT6
	11233	KLYT6
	11233	IA45Y
	11233	DDD7M
	11234	KLYT6
	11234	BDR5U0
	11235	DDD7M
	11235	KLYT6
	11235	BDR5U0
	11235	IA45Y

13 of 32

Proportion of transactions that contains the itemset in Y given that X and Y are independent.

Rule: {KLYT6} → {IA45Y}

⅘ of the transactions that contains IA45Y

IA45Y / |T|

E[c] = 0.8

Rule Evaluation Metrics: Expected Confidence


	Transaction ID	Product ID
	11231	IA45Y
	11231	DDD7M
	11231	BDR5U0
	11231	KLYT6
	11232	TTR45H
	11232	HJL9B
	11232	DDD7M
	11232	IA45Y
	11232	BDR5U0
	11232	KLYT6
	11233	KLYT6
	11233	IA45Y
	11233	DDD7M
	11234	KLYT6
	11234	BDR5U0
	11235	DDD7M
	11235	KLYT6
	11235	BDR5U0
	11235	IA45Y

14 of 32

Confidence of the rule divided by the

expected confidence.

Rule: {KLYT6} → {IA45Y}

Lift = 0.8/0.8 = 1

Lift can be interpreted as a measure of association between the two itemsets. Values greater than 1 indicates positive correlation, values equal to 1 indicates zero correlation, and values less than 1 indicate negative correlation.

Rule Evaluation Metrics: Lift


	Transaction ID	Product ID
	11231	IA45Y
	11231	DDD7M
	11231	BDR5U0
	11231	KLYT6
	11232	TTR45H
	11232	HJL9B
	11232	DDD7M
	11232	IA45Y
	11232	BDR5U0
	11232	KLYT6
	11233	KLYT6
	11233	IA45Y
	11233	DDD7M
	11234	KLYT6
	11234	BDR5U0
	11235	DDD7M
	11235	KLYT6
	11235	BDR5U0
	11235	IA45Y

15 of 32

Market Basket Analysis

Overview

16 of 32

“Association” analysis typically refers to products that are used one-per-customer – like personal banking products. (I have one checking account and one savings account.)

“Market Basket” analysis typically refers to products that are used many-per-customer, like groceries or essential items. (I buy many groceries every week.)

Market Basket Analysis

Rule

A ⇒ D

C ⇒ A

A ⇒ C

B & C ⇒ D

Support

2/5

1/5

Confidence

2/3

2/4

2/3

1/3

Source: Introduction to Pattern Discovery & SAS Institute

17 of 32

Barbie Doll ⇒ Candy

Put them closer together in the store
Put them far apart in the store
Package candy bars with the dolls
Package Barbie + candy + poorly selling item
Raise the price on one
Offer Barbie accessories for proofs of purchase
Do not advertise candy and Barbie together
Offer candies in the shape of a Barbie doll
Recommend candy to people who buy Barbie dolls

Introduction to the Pattern Discovery & SAS Institute

18 of 32

Association Recommendation

Association recommender systems are simple but effective
Relies only on transactional data
Likely the most direct method for creating personalized recommendations:

“If you like X, then you might like Y”
“These items are often bought together”

19 of 32

Interlude

Supervised Rule-based Models

20 of 32

Supervised Rule-based Models

Many interesting options exist for supervised classification with rule-based models, that are often highly interpretable. Popular approaches include:

RuleFit (Friedman & Popescu, 2005)�
Skope Rules �
Scalable Bayesian Rule List �(Yang, Rudin, & Seltzer, 2017)

A description of Skope rules.

21 of 32

Scalable Bayesian Rulelist

Source: Scalable Bayesian Rule Lists

22 of 32

Recommendations

Netflix Case

23 of 32

The Netflix Prize

From 2006 – 2009 Netflix ran a contest asking the public to submit algorithms to predict user ratings for movies

Training data set of ~100,000,000 ratings and test data set of ~3,000,000 ratings were provided

Offered a grand prize of $1,000,000 USD to the team who could beat Netflix’s own algorithm, Cinematch, by more than 10%, measured in RMSE

The Analytics Edge, MIT

24 of 32

Predicting the Best User Rating

Netflix was willing to pay over $1M for the best user rating algorithm, which shows how critical the recommendation system was to their business.

What data could be used to predict user ratings?

Every movie in Netflix’s database has the ranking from all users who have ranked that movie

We also know facts about the movie itself: actors, director, genre classifications, year released, etc.

Source: The Analytics Edge, MIT

25 of 32

Using Other User’s Rankings

Consider suggesting to Carl that he watch “Men in Black”, since Amy rated it highly and Carl and Amy seem to have similar preferences.

This technique is called Collaborative Filtering.

Source: The Analytics Edge, MIT

	Men in Black	Apollo 13	Top Gun	Terminator
Amy	5	4	5	4
Bob	3		2	5
Carl		5	4	4
Dan	4	2

26 of 32

Using Move Information

This technique is called Content Filtering

We saw that Amy liked “Men In Black”

It was directed by Barry Sonnenfeld

Classified in the genres of action, adventure, sci-fi and comedy

It stars actor Will Smith

Consider recommending to Amy:

Barry Sonnenfeld’s movie “Get Shorty”

“Jurassic Park”, which is in the genres of action, adventure, and sci-fi

Will Smith’s movie “Hitch”

Source: The Analytics Edge, MIT

27 of 32

Strengths and Weakness

Collaborative Filtering Systems:�

Can accurately suggest complex items without understanding the nature of the items.
Requires a lot of data about the user to make accurate recommendations.
Millions of items – need lots of computing power.

Content Filtering:�

Requires very little data to get started.
Can be limited in scope.

Source: The Analytics Edge, MIT

28 of 32

Hybrid Recommendation Systems

Netflix uses both collaborative and content filtering.

For example, consider a collaborative filtering approach where we determine that Amy and Carl have similar preferences.

We could then do content filtering, where we would find that “Terminator”, which both Amy and Carl liked, is classified in almost the same set of genres as “Starship Troopers.”

Recommend “Starship Troopers” to both Amy and Carl, even though neither of them have seen it before.

Source: The Analytics Edge, MIT

29 of 32

Recommendation Method Used

Collaborative Filtering:

zon.com
Last.fm
Spotify
Facebook
LinkedIn
Google News
MySpace
Netflix

Content Filtering:

Pandora
IMDB
Rotten Tomatoes
Jinni
Rovi Corporation
See This Next
MovieLens
Netflix

Source: The Analytics Edge, MIT

30 of 32

Cornerstone of these Well-known Businesses

Source: The Analytics Edge, MIT

31 of 32

The Edge of Recommendation Systems

In today’s digital age, businesses often have hundreds of thousands of items to offer their customers.

Excellent recommendation systems can make or break these businesses.

Clustering algorithms, which are tailored to find similar customers or similar items, also form the backbone of many of these recommendation systems.

The Analytics Edge, MIT

32 of 32

Reading

Introduction to Data Mining, Chapter 5

Sections 5.1 – 5.3, and 5.7

Elements of Statistical Learning

Section 14.2�

Introduction to Recommender Systems Handbook