1 of 31

Building tomorrow’s Products

Ammar Jawad,

Product Manager, Personalisation & ML Platform

Hotels.com at Expedia Group

1

2 of 31

Agenda

  • Supervised Learning
  • Reinforcement Learning & Personalisation
  • Building products with ML in mind

2

3 of 31

Introduction to Supervised Learning - I

3

What we know

What we want to know

Supervised Learning

Data around patients cancelling & showing up to medical appointments

Probability that a new patient will cancel their appointment.

4 of 31

Introduction to Supervised Learning - II

4

Data on patients cancelling medical appointments

Probability that a new patient will cancel their appointment.

5 of 31

Understanding Supervised Learning - I

5

Target labels

6 of 31

Understanding Supervised Learning - II

6

Training data

Testing data

7 of 31

Understanding Supervised Learning - III

7

Hide labels from the machine, but we know the values.

8 of 31

Understanding Supervised Learning - III

8

“Able to predict whether a patient will show up with 80% accuracy.”

9 of 31

Accuracy Paradox

  • In many occasions we have an imbalanced dataset, e.g. 80% of patients showed up and only 20% were a no-show in the medical clinic dataset.
  • A dumb model predicting the most frequent class (patients showing up) every time means our model will be 80% accurate (aka ‘null accuracy’).
  • The null accuracy will help us benchmark our machine-learned model, i.e. as a bare minimum it needs to be more accurate than 80%.

9

10 of 31

Accuracy Paradox - Not all mistakes are equal

- A Medical example

In this scenario we want to reduce false negatives (aka ‘recall’) as it is more dangerous to send home a sick patient without treatment than sending a healthy patient for more checks.

10

Diagnosed Sick

Diagnosed Healthy

Actually Sick

True Positive, i.e. diagnosed sick and is sick.

False Negative, i.e. patient is sick and model diagnoses them as healthy and sends them home.

Actually Healthy

False Positive, i.e. patient is healthy but diagnosed sick.

True Negative, i.e. diagnosed healthy and sent home and is healthy.

11 of 31

Accuracy Paradox - Not all mistakes are equal

- A Spam example

In this scenario we want to reduce false positives (aka ‘precision’) as it’s better to receive spam emails in your inbox compared to not receiving an important email in your inbox which ends up in your spam folder.

11

Sent to Spam folder

Sent to Inbox

Is Spam

True Positive, i.e. email is spam and sent to spam folder.

False Negative, i.e. email is sent to inbox but is spam.

Not Spam

False Positive, i.e. email is not spam but lands in spam folder.

True Negative, i.e. email is not Spam and correctly sent to inbox.

12 of 31

Let’s look at a real ML model

(Go to Github)

12

13 of 31

Reinforcement Learning

13

14 of 31

A/B/n testing

14

Buy now

ADD TO CART

PAY NOW

CLICK & SEE WHAT HAPPENS

All visitors

25% traffic

25% traffic

25% traffic

25% traffic

11.4% CTR

7.1% CTR

3.4% CTR

1.4% CTR

WINNER FOUND!

15 of 31

Reinforcement Learning in Experimentation

15

Action

Reward

Action

Reward

Context

Multi-Armed Bandits

Contextual Bandits

16 of 31

Multi-Armed Bandits

Online decision making

  • Should I exploit? (make the best decision given current info)
  • Should I explore? (gather more info)

Best long-term strategy may involve short-term sacrifices to maximise long-term gain

Other examples include:

  • Order your favorite dish vs. picking a new item on the menu
  • Show the most successful ad vs. try a different one

16

17 of 31

Multi-Armed Bandits - Use-cases

Whenever we don’t know the right numerical answer or we don’t know what the sweet spot is, bandits are ideal:

  • What is the right % discount on coupons that would drive profitability?
  • What is the right level of zoom on Maps to reduce bounce rate?
  • Which combination of modules on the website will drive engagement?
  • What is the right number of images to display below a featuredimage on the booking form, and what size should they be?
  • Out of 100 colours to represent a property’s price, which colour drives the highest GPV?

Multi-Armed bandits is effectively an approach to experimentation using machine learning.

17

18 of 31

Multi-Armed Bandits - Trade-off

18

19 of 31

Multi-Armed Bandits - Exploration

Three main approaches to exploration:

  1. Random exploration

Explore based on a probability to take a random action, e.g. explore 20% of the time.

�2. Optimism in the face of uncertainty

When faced with options for which we know the value of each except one action which value is unknown then there is a bias towards the action with an unknown outcome.

3. Information state space

Consider agent’s information as part of its state

Look ahead to see how information helps reward

19

20 of 31

Multi-Armed Bandits

20

Buy now

ADD TO CART

PAY NOW

CLICK & SEE WHAT HAPPENS

All visitors

11.4% CTR

7.1% CTR

3.4% CTR

1.4% CTR

70% traffic

20% traffic

7% traffic

3% traffic

25% traffic

25% traffic

25% traffic

25% traffic

Next 1000

21 of 31

Multi-Armed Bandits - Same Assumptions as A/B testing

21

  • If one bandit is built for a project we will identify the best variant for the average user
  • If one bandit is built per segment for a project it will require additional work on Data Science

22 of 31

Contextual Bandits I

22

Buy now

ADD TO CART

PAY NOW

North America

Europe

Asia

Africa

PAY NOW

Buy now

ADD TO CART

Morning

Noon

Evening

Night

Buy now

ADD TO CART

Buy now

Solo

Family

Romance

Business

CLICK & SEE WHAT HAPPENS

PAY NOW

Region Winner

Time of Day Winner

Customer Type Winner

CLICK & SEE WHAT HAPPENS

23 of 31

Contextual Bandits II

23

North America

Night

Family

Asia

Morning

Business

Europe

Evening

Solo

PAY NOW

ADD TO CART

Buy now

Winner

States

24 of 31

Contextual Bandits

  • Delivering true personalisation at scale
  • Individuals are more likely to see content that they interact favorably with
  • Experimentation 2.0 but still prone to seasonality and treats users as static

24

25 of 31

Challenging Experimentation Assumptions

  • Given that user behaviour, opinions and preferences change over time should our approach to experimentation not reflect that?
  • An experiment is run for a couple of days yet its outcome persist throughout a user’s lifetime.
    • Seasonality
    • Life changing events
    • Geography

25

26 of 31

Opportunity Cost in Experimentation

26

27 of 31

Continuous Exploration

27

28 of 31

ML Opportunity Framework

Helps product evaluate whether a use-case:

  • Fulfills the data prerequisites
  • Should be powered by machine learning

28

29 of 31

ML Framework - Step 1

29

30 of 31

ML Framework - Step 2

30

31 of 31

Contact details

31