1 of 65

2 of 65

About me

Independent consultant/Machine Learner/Data Scientist
BsC Statistics
Mathematics (California State University Long Beach)
Master Program Engineering Mathematics & Computational Science (Chalmer university of Technology)
Thesis : “WTTE-RNN : Weibull Time To Event Recurrent Neural Network”
Been working and thinking about this problem too long

3 of 65

Summary of MLJejuCamp

Before

Thesis
Bunch of ideas
Github repository 0.0.2
Some simple example implementation

Tiny datasets, not scalable
Only discrete data of a certain shape

After

Production ready 1.0.0 code
Technical advances
Tests
Examples

4 of 65

WTTE-RNN what?

5 of 65

Problem

“What’s the probability that _____ stops?”

“What’s the probability that _____ doesn’t happen within x days?”

“How long time until ______ happens?”

“What’s the distribution over the time to the next event?”

“When will something happen?”

≈ “When will something stop happening?”

Billion dollar question!

6 of 65

Solution: WTTE-RNN

Let your machine learning model output the parameters of a distribution and train it with a magic loss function.

Why Weibull?

Because it’s awesome

Why RNN?

The machine learning algorithm can be anything gradient-based but RNNs are awesome

7 of 65

Egil Martinsson

Mentor : Jeongkyu Shin

Weibull Time To Event RNN

“An algorithm & philosophy about predicting when things will happen”

Frame the problem:

Not a regression or classification problem

Example : commits to Tensorflow github-

repository

23 of 65

Hacky solution : Binary

35 of 65

Annoying hyperparameters
Non-informative predictions
Sparsity
Hacky

36 of 65

Less Hacky solution : WTTE-RNN

38 of 65

“Clever loss function”

39 of 65

Why Weibull Distribution?

Continuous or discretized
Closed form
Shows up in nature
Regularization
Flexible

40 of 65

Egil Martinsson

Mentor : Jeongkyu Shin

Weibull Time To Event RNN

“An algorithm & philosophy about predicting when things will happen”

2. Predict a distribution:

3. Use a clever loss from survival analysis

Frame the problem:

4. Train RNNs

5. Get useful predictions

Current & future risk
Expected time to event
Interpretable 2d-Embeddings

Vision

Simplicity
Generalizable

Not a regression or classification problem

Example : commits to Tensorflow github-

repository

41 of 65

Summary of ML-jejucamp

Before

Thesis
Bunch of ideas
Github repository 0.0.2
Some simple example implementation

Tiny datasets, not scalable
Only discrete data of a certain shape

After

Production-level code

Documentation
Full data pipeline
Speedups x3
Rigorous testing
Pypi-installable packages
Dev-friendly code structure

Technical advances

Scalable Basic examples
Template to apply to any problem
Continous/discrete/padded/unpadded data pipeline
Verified GPU
Verified stability given large networks

Examples

Clickstream data, gitlogs, artificial data
More to come

42 of 65

Some real examples

43 of 65

Take any dataframe with

ID, Timestamp, Features

and transform it into what you need for training.

Events (matrix)

[n_seq, n_timesteps]

Censoring indicators (matrix)

TTE (matrix)

Features (Tensor)

Events (matrix)

[n_seq, n_timesteps, n_features]

44 of 65

(Beta output activation function)

(Alpha output activation function)

Loss function:

45 of 65

Experiment:

Clickstream data* social service & job search site
26 k logged in users
~7m clicks

Features:

Sex, Age, #clicks per day

*Dees, M.; van Dongen, B.F. (2016) BPI Challenge 2016. UWV. Dataset.

(Weird architecture not recommended but I wanted some smooth embeddings to show)

48 of 65

Predicted alpha ≈ predicted location (like in the normal distribution) “When”

49 of 65

Predicted beta ≈ predicted scale (like in the normal distribution) “How sure we are”

50 of 65

Beta ~ “How sure we are”

Alpha ~ “When”

Dots are colored

Red dots if moving to the right (i.e prediction is higher than yesterday)

Blue marks users first day

How sure

When

Temporal 2d-embeddings

Jobsearch/social service-website Clickstream data

51 of 65

Beta ~ “How sure we are”

Alpha ~ “When”

Dots are colored

Red dots if moving to the right (i.e prediction is higher than yesterday)

Blue marks users first day

How sure

When

Temporal 2d-embeddings

Jobsearch/social service-website Clickstream data

52 of 65

Beta ~ “How sure we are”

Alpha ~ “When”

Dots are colored

Red dots if moving to the right (i.e prediction is higher than yesterday)

Blue marks users first day

How sure

When

Temporal 2d-embeddings

Tensorflow github-commits

53 of 65

Problems

Predicted TTE stable but huge!

Dead users causes problems (bug or feature?)
RNNs learns the censoring (artifact learning)
Multiple (not very effective) solutions:

Clipping
Adding noise
Arbitrary weighting

Another solution that shows promise:

Overfit an identical binary “Artifact learner” to predict censoring.
Use the prediction to weight censored observations predictable as censored lower
Train the WTTE-RNN using the new observation weights

54 of 65

Example: Machine failure

Machine run-to failure experiments

(Turbofan dataset)

55 of 65

alpha vs beta

When

56 of 65

alpha vs beta vs time

58 of 65

Conclusion

If you don’t have censoring it’s just a really neat objective function
If you do have censoring it works well
If you have alot of censoring (dead sequences) you may see some unexpected (but potentially useful) results

60 of 65

Code

Py2 + Py3

Github repository

Tensorflow Objective functions
Keras layers and helpers
Data transformations
General data-pipeline

Parse any pandas dataframe (ID,Timestamp,...)

Visualization

Documentation
Example implementations

Jupyter Notebooks

61 of 65

TODO

Fix & understand the artifact-learning problem
Continuous data (WIP but example not published)

Asynchronous predictions

Multivariate
Other distributions

63 of 65

> pip install wtte

Thank you

64 of 65

Egil Martinsson

Mentor : Jeongkyu Shin