1 of 32

Machine Learning II:

Association Rules and Recommendation

Patrick Hall

Visiting Faculty, Department of Decision Sciences

George Washington University

2 of 32

Lecture 11 Agenda

  • Association Rules
    • Rule Evaluation Metrics
  • Market Basket Analysis
  • Interlude: Supervised Rule-based Models
  • Recommendations
    • Netflix Case
  • Readings

3 of 32

Where are we in the modeling lifecycle?

Data Collection �& ETL

Feature Selection & Engineering

Supervised�Learning

Unsupervised�Learning

Deployment

Cost Intensive

Revenue

Generating

Assessment & Validation

4 of 32

Association Rules

Overview

Rule Evaluation Metrics

5 of 32

Overview: Association Rules

  • Association analysis is useful for discovering interesting relationship(s) hidden in large data sets:
    • Undiscovered relationships can be represented in the form of sets of items known as frequent itemsets or association rules.
  • Association rules suggest relationships in transactions that can be used in many business practices.
  • Two key issues that must be addressed in association analysis:
    • Discovering patterns from large transactional datasets can be computationally expensive.
    • Some of the discovered patterns may be spurious (random).

Source: Introduction to Data Mining

6 of 32

Mining Association Rules

Two-step approach:

  1. Frequent itemset generation: Generate all itemsets whose support ≥ minimum support hyperparameter.�
  2. Rule generation: Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset.

Frequent itemset generation is computationally expensive!

Source: Introduction to Data Mining

7 of 32

Association Rules Mining

Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction.

Source: Introduction to Data Mining

Market-Basket Transactions

Examples of Association Rules:

  • {Diaper} → {Beer}
  • {Milk, Bread} → {Eggs, Coke}
  • {Beer, Bread} → {Milk}

Implication means co-occurrence, not causality!

8 of 32

Terms and Definitions

  • Itemset:
    • A collection of one or more items
      • Example: {Milk, Bread, Diaper}
    • k-itemset
      • An itemset that contains k items
  • Frequent itemset:
    • An itemset whose support is greater than or equal to a minimum support hyperparameter
  • Support count (σ):
    • Frequency of occurrence of an itemset
    • E.g. , σ({Milk, Bread, Diaper}) = 2
  • Support (s):
    • Fraction of transactions that contain an itemset
    • E.g., s({Milk, Bread, Diaper}) = 2/5

Source: Introduction to Data Mining

9 of 32

Rule Evaluation Metrics

  • Association Rules:
    • An expression of the form X → Y, where X and Y are itemsets.
    • Example: {Milk, Diaper} → {Beer}
  • Rule Evaluation Metrics:
    • Support (s): fraction of transactions that contain both X and Y.
    • Confidence (c): measures how often items in Y appear in transactions that contain X.
    • Expected confidence (E[c]): number itemsets that contain Y.
    • Lift (l): measure of correlation between itemsets: confidence/expected confidence.

Source: Introduction to Data Mining

Example:

10 of 32

Association Rules

Example of Rules:

{Milk, Diaper} → {Beer} (s=0.4, c=0.67)�{Milk, Beer} → {Diaper} (s=0.4, c=1.0)

{Diaper, Beer} → {Milk} (s=0.4, c=0.67)

{Beer} → {Milk, Diaper} (s=0.4, c=0.67) �{Diaper} → {Milk, Beer} (s=0.4, c=0.5)

{Milk} → {Diaper, Beer} (s=0.4, c=0.5)

Observations:

  • All the rules are binary partitions of the same itemset: {Milk, Diaper, Beer}.
  • Rules originating from the same itemset have identical support but can have different confidence.
  • Thus, we may decouple the support and confidence requirements.

Source: Introduction to Data Mining

11 of 32

Probability that the two itemsets in a rule occur together.

Itemset: {KLYT6, IA45Y}

Rule: {KLYT6} → {IA45Y}

KLYT6 and IA45Y both occur together in ⅘ transactions

(KLYT6 and IA45Y) / |T|

s = 0.8

Support is symmetric:

s({KLYT6} → {IA45Y}) = s({IA45Y} → {KLYT6})

Rule Evaluation Metrics: Support

Transaction ID

Product ID

11231

IA45Y

11231

DDD7M

11231

BDR5U0

11231

KLYT6

11232

TTR45H

11232

HJL9B

11232

DDD7M

11232

IA45Y

11232

BDR5U0

11232

KLYT6

11233

KLYT6

11233

IA45Y

11233

DDD7M

11234

KLYT6

11234

BDR5U0

11235

DDD7M

11235

KLYT6

11235

BDR5U0

11235

IA45Y

12 of 32

Conditional probability of a transaction containing itemset Y given that it contains itemset X.

Rule: {KLYT6} → {IA45Y}

⅘ of the transactions that contains KLYT6 (X)

also contains IA45Y (Y)

(KLYT6 and IA45Y) / {KLYT6}

c = 0.8

Rule Evaluation Metrics: Confidence

Transaction ID

Product ID

11231

IA45Y

11231

DDD7M

11231

BDR5U0

11231

KLYT6

11232

TTR45H

11232

HJL9B

11232

DDD7M

11232

IA45Y

11232

BDR5U0

11232

KLYT6

11233

KLYT6

11233

IA45Y

11233

DDD7M

11234

KLYT6

11234

BDR5U0

11235

DDD7M

11235

KLYT6

11235

BDR5U0

11235

IA45Y

13 of 32

Proportion of transactions that contains the itemset in Y given that X and Y are independent.

Rule: {KLYT6} → {IA45Y}

⅘ of the transactions that contains IA45Y

IA45Y / |T|

E[c] = 0.8

Rule Evaluation Metrics: Expected Confidence

Transaction ID

Product ID

11231

IA45Y

11231

DDD7M

11231

BDR5U0

11231

KLYT6

11232

TTR45H

11232

HJL9B

11232

DDD7M

11232

IA45Y

11232

BDR5U0

11232

KLYT6

11233

KLYT6

11233

IA45Y

11233

DDD7M

11234

KLYT6

11234

BDR5U0

11235

DDD7M

11235

KLYT6

11235

BDR5U0

11235

IA45Y

14 of 32

Confidence of the rule divided by the

expected confidence.

Rule: {KLYT6} → {IA45Y}

Lift = 0.8/0.8 = 1

Lift can be interpreted as a measure of association between the two itemsets. Values greater than 1 indicates positive correlation, values equal to 1 indicates zero correlation, and values less than 1 indicate negative correlation.

Rule Evaluation Metrics: Lift

Transaction ID

Product ID

11231

IA45Y

11231

DDD7M

11231

BDR5U0

11231

KLYT6

11232

TTR45H

11232

HJL9B

11232

DDD7M

11232

IA45Y

11232

BDR5U0

11232

KLYT6

11233

KLYT6

11233

IA45Y

11233

DDD7M

11234

KLYT6

11234

BDR5U0

11235

DDD7M

11235

KLYT6

11235

BDR5U0

11235

IA45Y

15 of 32

Market Basket Analysis

Overview

16 of 32

“Association” analysis typically refers to products that are used one-per-customer – like personal banking products. (I have one checking account and one savings account.)

“Market Basket” analysis typically refers to products that are used many-per-customer, like groceries or essential items. (I buy many groceries every week.)

Market Basket Analysis

A

B

C

A

C

D

B

C

D

A

D

E

B

C

E

Rule

A ⇒ D

C ⇒ A

A ⇒ C

B & C ⇒ D

Support

2/5

2/5

2/5

1/5

Confidence

2/3

2/4

2/3

1/3

Source: Introduction to Pattern Discovery & SAS Institute

17 of 32

Barbie Doll ⇒ Candy

  • Put them closer together in the store
  • Put them far apart in the store
  • Package candy bars with the dolls
  • Package Barbie + candy + poorly selling item
  • Raise the price on one
  • Offer Barbie accessories for proofs of purchase
  • Do not advertise candy and Barbie together
  • Offer candies in the shape of a Barbie doll
  • Recommend candy to people who buy Barbie dolls

Introduction to the Pattern Discovery & SAS Institute

18 of 32

Association Recommendation

  • Association recommender systems are simple but effective
  • Relies only on transactional data
  • Likely the most direct method for creating personalized recommendations:
    • “If you like X, then you might like Y”
    • “These items are often bought together”

19 of 32

Interlude

Supervised Rule-based Models

20 of 32

Supervised Rule-based Models

Many interesting options exist for supervised classification with rule-based models, that are often highly interpretable. Popular approaches include:

  • RuleFit (Friedman & Popescu, 2005)�
  • Skope Rules
  • Scalable Bayesian Rule List �(Yang, Rudin, & Seltzer, 2017)

A description of Skope rules.

21 of 32

Scalable Bayesian Rulelist

22 of 32

Recommendations

Netflix Case

23 of 32

The Netflix Prize

  • From 2006 – 2009 Netflix ran a contest asking the public to submit algorithms to predict user ratings for movies

  • Training data set of ~100,000,000 ratings and test data set of ~3,000,000 ratings were provided

  • Offered a grand prize of $1,000,000 USD to the team who could beat Netflix’s own algorithm, Cinematch, by more than 10%, measured in RMSE

The Analytics Edge, MIT

24 of 32

Predicting the Best User Rating

  • Netflix was willing to pay over $1M for the best user rating algorithm, which shows how critical the recommendation system was to their business.

  • What data could be used to predict user ratings?

  • Every movie in Netflix’s database has the ranking from all users who have ranked that movie

  • We also know facts about the movie itself: actors, director, genre classifications, year released, etc.

Source: The Analytics Edge, MIT

25 of 32

Using Other User’s Rankings

  • Consider suggesting to Carl that he watch “Men in Black”, since Amy rated it highly and Carl and Amy seem to have similar preferences.

  • This technique is called Collaborative Filtering.

Source: The Analytics Edge, MIT

Men in Black

Apollo 13

Top Gun

Terminator

Amy

5

4

5

4

Bob

3

2

5

Carl

5

4

4

Dan

4

2

26 of 32

Using Move Information

This technique is called Content Filtering

We saw that Amy liked “Men In Black”

  • It was directed by Barry Sonnenfeld

  • Classified in the genres of action, adventure, sci-fi and comedy

  • It stars actor Will Smith

Consider recommending to Amy:

  • Barry Sonnenfeld’s movie “Get Shorty”

  • “Jurassic Park”, which is in the genres of action, adventure, and sci-fi

  • Will Smith’s movie “Hitch”

Source: The Analytics Edge, MIT

27 of 32

Strengths and Weakness

Collaborative Filtering Systems:�

  • Can accurately suggest complex items without understanding the nature of the items.
  • Requires a lot of data about the user to make accurate recommendations.
  • Millions of items – need lots of computing power.

Content Filtering:�

  • Requires very little data to get started.
  • Can be limited in scope.

Source: The Analytics Edge, MIT

28 of 32

Hybrid Recommendation Systems

  • Netflix uses both collaborative and content filtering.

  • For example, consider a collaborative filtering approach where we determine that Amy and Carl have similar preferences.

  • We could then do content filtering, where we would find that “Terminator”, which both Amy and Carl liked, is classified in almost the same set of genres as “Starship Troopers.”

  • Recommend “Starship Troopers” to both Amy and Carl, even though neither of them have seen it before.

Source: The Analytics Edge, MIT

29 of 32

Recommendation Method Used

Collaborative Filtering:

  • zon.com
  • Last.fm
  • Spotify
  • Facebook
  • LinkedIn
  • Google News
  • MySpace
  • Netflix

Content Filtering:

  • Pandora
  • IMDB
  • Rotten Tomatoes
  • Jinni
  • Rovi Corporation
  • See This Next
  • MovieLens
  • Netflix

Source: The Analytics Edge, MIT

30 of 32

Cornerstone of these Well-known Businesses

Source: The Analytics Edge, MIT

TM

TM

TM

TM

TM

TM

31 of 32

The Edge of Recommendation Systems

  • In today’s digital age, businesses often have hundreds of thousands of items to offer their customers.

  • Excellent recommendation systems can make or break these businesses.

  • Clustering algorithms, which are tailored to find similar customers or similar items, also form the backbone of many of these recommendation systems.

The Analytics Edge, MIT

32 of 32

Reading

  • Introduction to Data Mining, Chapter 5
    • Sections 5.1 – 5.3, and 5.7