1 of 65

Weibull Time To Event RNN

“An algorithm & philosophy about predicting when things will happen”

Egil martinsson

ML Jeju Camp

2017-07-27

Mentor : Jeongkyu Shin

Help : Joongi Kim

& Jonghyun Park

ragulpr.github.io

github.com/ragulpr/wtte-rnn

Thesis

2 of 65

About me

  • Independent consultant/Machine Learner/Data Scientist
  • BsC Statistics
  • Mathematics (California State University Long Beach)
  • Master Program Engineering Mathematics & Computational Science (Chalmer university of Technology)
  • Thesis : “WTTE-RNN : Weibull Time To Event Recurrent Neural Network”
  • Been working and thinking about this problem too long

3 of 65

Summary of MLJejuCamp

Before

  • Thesis
  • Bunch of ideas
  • Github repository 0.0.2
  • Some simple example implementation
    • Tiny datasets, not scalable
    • Only discrete data of a certain shape

After

  • Production ready 1.0.0 code
  • Technical advances
  • Tests
  • Examples

4 of 65

WTTE-RNN what?

5 of 65

Problem

  1. “What’s the probability that _____ stops?”

  • “What’s the probability that _____ doesn’t happen within x days?”

  • “How long time until ______ happens?”

  • “What’s the distribution over the time to the next event?”

“When will something happen?”

≈ “When will something stop happening?”

  • Billion dollar question!

4.

3.

2.

1.

6 of 65

Solution: WTTE-RNN

Let your machine learning model output the parameters of a distribution and train it with a magic loss function.

Why Weibull?

  • Because it’s awesome

Why RNN?

  • The machine learning algorithm can be anything gradient-based but RNNs are awesome

7 of 65

Egil Martinsson

Mentor : Jeongkyu Shin

ragulpr.github.io

github.com/ragulpr/wtte-rnn

Thesis

Weibull Time To Event RNN

“An algorithm & philosophy about predicting when things will happen”

  1. Frame the problem:
  • Not a regression or classification problem

Example : commits to Tensorflow github-

repository

8 of 65

9 of 65

10 of 65

11 of 65

12 of 65

13 of 65

14 of 65

15 of 65

16 of 65

17 of 65

18 of 65

19 of 65

20 of 65

21 of 65

22 of 65

23 of 65

Hacky solution : Binary

24 of 65

25 of 65

26 of 65

27 of 65

28 of 65

29 of 65

30 of 65

31 of 65

32 of 65

33 of 65

34 of 65

35 of 65

  • Annoying hyperparameters
  • Non-informative predictions
  • Sparsity
  • Hacky

36 of 65

Less Hacky solution : WTTE-RNN

37 of 65

38 of 65

“Clever loss function”

39 of 65

Why Weibull Distribution?

  • Continuous or discretized
  • Closed form
  • Shows up in nature
  • Regularization
  • Flexible

40 of 65

Egil Martinsson

Mentor : Jeongkyu Shin

ragulpr.github.io

github.com/ragulpr/wtte-rnn

Thesis

Weibull Time To Event RNN

“An algorithm & philosophy about predicting when things will happen”

2. Predict a distribution:

3. Use a clever loss from survival analysis

  • Frame the problem:

4. Train RNNs

5. Get useful predictions

  • Current & future risk
  • Expected time to event
  • Interpretable 2d-Embeddings

Vision

  • Simplicity
  • Generalizable
  • Not a regression or classification problem

Example : commits to Tensorflow github-

repository

41 of 65

Summary of ML-jejucamp

Before

  • Thesis
  • Bunch of ideas
  • Github repository 0.0.2
  • Some simple example implementation
    • Tiny datasets, not scalable
    • Only discrete data of a certain shape

After

  • Production-level code
    • Documentation
    • Full data pipeline
    • Speedups x3
    • Rigorous testing
    • Pypi-installable packages
    • Dev-friendly code structure
  • Technical advances
    • Scalable Basic examples
    • Template to apply to any problem
    • Continous/discrete/padded/unpadded data pipeline
    • Verified GPU
    • Verified stability given large networks
  • Examples
    • Clickstream data, gitlogs, artificial data
    • More to come

42 of 65

Some real examples

43 of 65

Take any dataframe with

ID, Timestamp, Features

and transform it into what you need for training.

Events (matrix)

[n_seq, n_timesteps]

Censoring indicators (matrix)

TTE (matrix)

Features (Tensor)

Events (matrix)

[n_seq, n_timesteps, n_features]

44 of 65

(Beta output activation function)

(Alpha output activation function)

Loss function:

45 of 65

Experiment:

  • Clickstream data* social service & job search site
  • 26 k logged in users
  • ~7m clicks

Features:

Sex, Age, #clicks per day

*Dees, M.; van Dongen, B.F. (2016) BPI Challenge 2016. UWV. Dataset.

(Weird architecture not recommended but I wanted some smooth embeddings to show)

46 of 65

47 of 65

48 of 65

Predicted alpha ≈ predicted location (like in the normal distribution) “When”

49 of 65

Predicted beta ≈ predicted scale (like in the normal distribution) “How sure we are”

50 of 65

Beta ~ “How sure we are”

Alpha ~ “When”

Dots are colored

Red dots if moving to the right (i.e prediction is higher than yesterday)

Blue marks users first day

How sure

When

Temporal 2d-embeddings

Jobsearch/social service-website Clickstream data

51 of 65

Beta ~ “How sure we are”

Alpha ~ “When”

Dots are colored

Red dots if moving to the right (i.e prediction is higher than yesterday)

Blue marks users first day

How sure

When

Temporal 2d-embeddings

Jobsearch/social service-website Clickstream data

52 of 65

Beta ~ “How sure we are”

Alpha ~ “When”

Dots are colored

Red dots if moving to the right (i.e prediction is higher than yesterday)

Blue marks users first day

How sure

When

Temporal 2d-embeddings

Tensorflow github-commits

53 of 65

Problems

Predicted TTE stable but huge!

  • Dead users causes problems (bug or feature?)
  • RNNs learns the censoring (artifact learning)
  • Multiple (not very effective) solutions:
    • Clipping
    • Adding noise
    • Arbitrary weighting

Another solution that shows promise:

  1. Overfit an identical binary “Artifact learner” to predict censoring.
  2. Use the prediction to weight censored observations predictable as censored lower
  3. Train the WTTE-RNN using the new observation weights

54 of 65

Example: Machine failure

Machine run-to failure experiments

(Turbofan dataset)

55 of 65

alpha vs beta

When

When

56 of 65

alpha vs beta vs time

57 of 65

58 of 65

Conclusion

  • If you don’t have censoring it’s just a really neat objective function
  • If you do have censoring it works well
  • If you have alot of censoring (dead sequences) you may see some unexpected (but potentially useful) results

59 of 65

Code

60 of 65

Code

Py2 + Py3

  • Github repository
    • Tensorflow Objective functions
    • Keras layers and helpers
    • Data transformations
    • General data-pipeline
      • Parse any pandas dataframe (ID,Timestamp,...)
    • Visualization
  • Documentation
  • Example implementations
    • Jupyter Notebooks

61 of 65

TODO

  • Fix & understand the artifact-learning problem
  • Continuous data (WIP but example not published)
    • Asynchronous predictions
  • Multivariate
  • Other distributions

62 of 65

63 of 65

> pip install wtte

Thank you

64 of 65

Egil Martinsson

Mentor : Jeongkyu Shin

ragulpr.github.io

github.com/ragulpr/wtte-rnn

Thesis

Weibull Time To Event RNN

“An algorithm & philosophy about predicting when things will happen”

2. Predict a distribution:

3. Use a clever loss from survival analysis

  • Frame the problem:

4. Train RNNs

5. Get useful predictions

  • Current & future risk
  • Expected time to event
  • Interpretable 2d-Embeddings

Jejucamp goals:

  • Make it better & scalable
  • Easier adoption
  • Extensions
  • Not a regression or classification problem

Example : commits to Tensorflow github-

repository

65 of 65

Survival methods math

  • some distr controlled by param
  • (random) censoring time
  • (observed)
  • (observed)