1 of 34

20/10/2016 @ TakeOffConf

Fabien VAUCHELLES

zelros.com / fabien.vauchelles.@zelros.com / @fabienv

http://bit.ly/mltakeoff

2 of 34

FABIEN VAUCHELLES

Developer for 16 years

CTO of

Expert in data extraction (webscraping)

Creator of Scrapoxy.io

3 of 34

INTRODUCTION

4 of 34

I WILL TELL YOU THE TRUTH

5 of 34

JACK KNEW HE WOULD DIE

6 of 34

JACK MET A PSYCHIC

7 of 34

LIFE DECISION PATH

JACK

8 of 34

LIFE DECISION PATH

Men

Women

JACK

9 of 34

LIFE DECISION PATH

Men

Women

JACK

Child

Adult

10 of 34

LIFE DECISION PATH

Men

Women

Child

Adult

1st class

3th class

JACK

11 of 34

LIFE DECISION PATH

Men

Women

Child

Adult

1st class

3th class

GAME OVER

JACK

70%

12 of 34

What is the type of problem

13 of 34

SUPERVISED LEARNING

Men

Women

Child

Adult

1st class

3th class

GAME OVER

JACK

70%

14 of 34

UNSUPERVISED LEARNING

age

satisfaction

15 of 34

UNSUPERVISED LEARNING

age

satisfaction

16 of 34

How to start a

ML problem

17 of 34

THE PROCESS OF DATA SCIENCE

ANALYZE DATA

VALIDATE THE PREDICTION

TRAIN THE MODEL

1

2

3

18 of 34

THE PROCESS OF DATA SCIENCE

ANALYZE DATA

VALIDATE THE PREDICTION

TRAIN THE MODEL

1

2

3

19 of 34

THE PROCESS OF DATA SCIENCE

ANALYZE DATA

VALIDATE THE PREDICTION

TRAIN THE MODEL

1

2

3

20 of 34

ANALYZE DATA

21 of 34

DEMO

22 of 34

TRAIN THE MODEL

23 of 34

What can we

predict

24 of 34

CLASSIFICATION

Do we have survived on Titanic ?

NAME

AGE

CLASS

DIED ?

John

23

3

Yes

Marry

31

1

No

Henry

23

2

Yes

Nicolas

41

1

No

Anna

18

3

Yes

25 of 34

REGRESSION

Find house price:

Surface (m2)

Rooms

Bedrooms

Garden (m2)

Price (€)

200

5

2

200

500 000

100

3

1

0

200 000

300

5

2

300

800 000

150

4

2

100

300 000

200

4

1

200

?

26 of 34

ALGORITHMS

DECISION TREE

DEEP LEARNING

LINEAR

CLUSTERING

BAYESIAN

NLP

Linear Regression

Logistic Regression

Convolutional Neural Network

Deep Boltzmann Machine

Recurrent Neural Network

Gaussian Naive Bayes

Multinomial Naive Bayes

Bayesian Network

k-Means

k-Medians

Hierarchical Clustering

Perceptron

Random Forest

Gradient Boosting

XGBoost

TF-IDF

Word2Vec

27 of 34

What is

Linear Regression

28 of 34

DEMO

29 of 34

VALIDATE

THE PREDICTION

30 of 34

MINIMISE ERROR

h(X)=θ01X

real value

predicted value

31 of 34

MINIMISE ERROR

h(X)=θ01X2X2

real value

predicted value

32 of 34

DEMO

33 of 34

RESOURCES

  • Coursera Machine Learning

https://www.coursera.org/learn/machine-learning

  • Kaggle

http://www.kaggle.com

34 of 34

ANY QUESTIONS ?

zelros.com / fabien.vauchelles.@zelros.com / @fabienv

http://bit.ly/mltakeoff