1 of 31

Building tomorrow’s Products

Ammar Jawad,

Product Manager, Personalisation & ML Platform

Hotels.com at Expedia Group

1

2 of 31

Agenda

Supervised Learning
Reinforcement Learning & Personalisation
Building products with ML in mind

2

3 of 31

Introduction to Supervised Learning - I

3

What we know

What we want to know

Supervised Learning

Data around patients cancelling & showing up to medical appointments

Probability that a new patient will cancel their appointment.

4 of 31

Introduction to Supervised Learning - II

4

Data on patients cancelling medical appointments

Probability that a new patient will cancel their appointment.

5 of 31

Understanding Supervised Learning - I

5

Target labels

6 of 31

Understanding Supervised Learning - II

6

Training data

Testing data

7 of 31

Understanding Supervised Learning - III

7

Hide labels from the machine, but we know the values.

8 of 31

Understanding Supervised Learning - III

8

“Able to predict whether a patient will show up with 80% accuracy.”

9 of 31

Accuracy Paradox

In many occasions we have an imbalanced dataset, e.g. 80% of patients showed up and only 20% were a no-show in the medical clinic dataset.
A dumb model predicting the most frequent class (patients showing up) every time means our model will be 80% accurate (aka ‘null accuracy’).
The null accuracy will help us benchmark our machine-learned model, i.e. as a bare minimum it needs to be more accurate than 80%.

9

10 of 31

Accuracy Paradox - Not all mistakes are equal

- A Medical example

In this scenario we want to reduce false negatives (aka ‘recall’) as it is more dangerous to send home a sick patient without treatment than sending a healthy patient for more checks.

10

	Diagnosed Sick	Diagnosed Healthy
Actually Sick	True Positive, i.e. diagnosed sick and is sick.	False Negative, i.e. patient is sick and model diagnoses them as healthy and sends them home.
Actually Healthy	False Positive, i.e. patient is healthy but diagnosed sick.	True Negative, i.e. diagnosed healthy and sent home and is healthy.

11 of 31

Accuracy Paradox - Not all mistakes are equal

- A Spam example

In this scenario we want to reduce false positives (aka ‘precision’) as it’s better to receive spam emails in your inbox compared to not receiving an important email in your inbox which ends up in your spam folder.

11

	Sent to Spam folder	Sent to Inbox
Is Spam	True Positive, i.e. email is spam and sent to spam folder.	False Negative, i.e. email is sent to inbox but is spam.
Not Spam	False Positive, i.e. email is not spam but lands in spam folder.	True Negative, i.e. email is not Spam and correctly sent to inbox.

12 of 31

Let’s look at a real ML model

(Go to Github)

12

13 of 31

Reinforcement Learning

13

In reinforcement learning, an agent makes observations and takes actions within an environment and in return receives rewards. Its objective is to learn to act in a way that will maximise its expected long-term rewards.

agent, i.e. the thing that senses the environment, the thing we are trying to code artificial intelligence/learning into.

Example 1: the program controlling a walking robot
Example 2: the program controlling a character in a video game

environment, i.e. real world or simulated which gives the agent a reward (either +ve or -ve but must be a number).

Example 1: it is the real world in the walking robot example
Example 2: the simulation of a video game

state, i.e. different configs of the environment that the agent can sense
robot observes the environment using sensors such as cameras, or in the video game theobservationare screenshots
robot takes actions which consist of sending signals to activate motors, or the possible joystick positions in a joystick for the video game.
it may receive a reward for approaching target destination or punished whenever it wastes time, goes in the wrong direction or falls down.

14 of 31

A/B/n testing

14

Buy now

ADD TO CART

PAY NOW

CLICK & SEE WHAT HAPPENS

All visitors

25% traffic

11.4% CTR

7.1% CTR

3.4% CTR

1.4% CTR

WINNER FOUND!

15 of 31

Reinforcement Learning in Experimentation

15

Action

Reward

Action

Reward

Context

Multi-Armed Bandits

Contextual Bandits

16 of 31

Multi-Armed Bandits

Online decision making

Should I exploit? (make the best decision given current info)
Should I explore? (gather more info)

Best long-term strategy may involve short-term sacrifices to maximise long-term gain

Other examples include:

Order your favorite dish vs. picking a new item on the menu
Show the most successful ad vs. try a different one

16

17 of 31

Multi-Armed Bandits - Use-cases

Whenever we don’t know the right numerical answer or we don’t know what the sweet spot is, bandits are ideal:

What is the right % discount on coupons that would drive profitability?
What is the right level of zoom on Maps to reduce bounce rate?
Which combination of modules on the website will drive engagement?
What is the right number of images to display below a featuredimage on the booking form, and what size should they be?
Out of 100 colours to represent a property’s price, which colour drives the highest GPV?

Multi-Armed bandits is effectively an approach to experimentation using machine learning.

17

18 of 31

Multi-Armed Bandits - Trade-off

18

19 of 31

Multi-Armed Bandits - Exploration

Three main approaches to exploration:

Random exploration

Explore based on a probability to take a random action, e.g. explore 20% of the time.

�2. Optimism in the face of uncertainty

When faced with options for which we know the value of each except one action which value is unknown then there is a bias towards the action with an unknown outcome.

3. Information state space

Consider agent’s information as part of its state

Look ahead to see how information helps reward

19

20 of 31

Multi-Armed Bandits

20

Buy now

ADD TO CART

PAY NOW

CLICK & SEE WHAT HAPPENS

All visitors

11.4% CTR

7.1% CTR

3.4% CTR

1.4% CTR

70% traffic

20% traffic

7% traffic

3% traffic

25% traffic

Next 1000

Bandits are in most situations preferred over classical A/B tests because:

Faster because more traffic in this example would have gone to bad performing variants and instead bandits will assign them to potential winning variants.
More efficient because their experiments usually conclude faster than A/B tests
Less risk as it is usually safer to direct more traffic towards bandits experiments as they will be displaying the best variants most times
The more variants you want to test against the control group the better bandits are over A/B tests
If one variant performs much better than others a winner will be identified quickly.
Bandits perform terrible when two variants are identical in performance
Bandits also make a similar assumption as A/B test that there is such a thing as THE best variant to be displayed for all users or at best to a customer segment type
THis leads me to Contextual bandits

21 of 31

Multi-Armed Bandits - Same Assumptions as A/B testing

21

If one bandit is built for a project we will identify the best variant for the average user
If one bandit is built per segment for a project it will require additional work on Data Science

22 of 31

Contextual Bandits I

22

Buy now

ADD TO CART

PAY NOW

North America
Europe
Asia
Africa

PAY NOW

Buy now

ADD TO CART

Morning
Noon
Evening
Night

Buy now

ADD TO CART

Buy now

Solo
Family
Romance
Business

CLICK & SEE WHAT HAPPENS

PAY NOW

Region Winner

Time of Day Winner

Customer Type Winner

CLICK & SEE WHAT HAPPENS

23 of 31

Contextual Bandits II

23

North America	Night	Family
Asia	Morning	Business
Europe	Evening	Solo

PAY NOW

ADD TO CART

Buy now

Winner

States

24 of 31

Contextual Bandits

Delivering true personalisation at scale
Individuals are more likely to see content that they interact favorably with
Experimentation 2.0 but still prone to seasonality and treats users as static

24

25 of 31

Challenging Experimentation Assumptions

Given that user behaviour, opinions and preferences change over time should our approach to experimentation not reflect that?
An experiment is run for a couple of days yet its outcome persist throughout a user’s lifetime.

Seasonality
Life changing events
Geography

25

26 of 31

Opportunity Cost in Experimentation

26

27 of 31

Continuous Exploration

27

28 of 31

ML Opportunity Framework

Helps product evaluate whether a use-case:

Fulfills the data prerequisites
Should be powered by machine learning

28

29 of 31

ML Framework - Step 1

29

30 of 31

ML Framework - Step 2

30

31 of 31

Contact details

Email: b-ajawad@hotmail.com

Follow me:

linkedin.com/in/ammarjawad/

quora.com/profile/Ammar-Jawad

31