1 of 22

Logistic regression

2 of 22

How to get big data?

3 of 22

-Use public data (from government agents)

-Collect using sensors

-Scrape from website

google analytics, amazon

4 of 22

Google analytics

5 of 22

Scrape

6 of 22

Reviews: 9505 in Amazon

Review analysis

7 of 22

8 of 22

9 of 22

Rocco & Roxie (analysis of data from only Amazon)

32oz

1 gallon

10 of 22

Logit model

Y=exp(1+x)/(1+exp(1+x))

It shows two distinc values across x values

11 of 22

Titanic

12 of 22

13 of 22

14 of 22

15 of 22

Titanic

Y=exp(4.316+pclass+sex-0.038*age-0.332*sibsp+embarked)/(1+exp(4.316+pclass+sex-0.038*age-0.332*sibsp+embarked))

16 of 22

(25 years old, 3 class, male, no sib, no spouse, Queenstown)

Y=exp(4.316+pclass+sex-0.038*age-0.332*sibsp+embarked)/(1+exp(4.316+pclass+sex-0.038*age-0.332*sibsp+embarked))

Y=exp(4.316-2.069-2.633-0.038*25-1.471)/(1+exp(4.316-2.069-2.633-0.038*25-1.471))=exp(-2.807)/(1+exp(-2.807))=0.057, 5.7%

17 of 22

Prediction

18 of 22

19 of 22

20 of 22

Evaluation

If pred_logit>0.5 -> Yes

21 of 22

Create val

22 of 22

Pivot->val