Logistic regression
How to get big data?
-Use public data (from government agents)
-Collect using sensors
-Scrape from website
google analytics, amazon
Google analytics
Scrape
Reviews: 9505 in Amazon
Review analysis
Rocco & Roxie (analysis of data from only Amazon)
32oz
1 gallon
Logit model
Y=exp(1+x)/(1+exp(1+x))
It shows two distinc values across x values
Titanic
Titanic
Y=exp(4.316+pclass+sex-0.038*age-0.332*sibsp+embarked)/(1+exp(4.316+pclass+sex-0.038*age-0.332*sibsp+embarked))
(25 years old, 3 class, male, no sib, no spouse, Queenstown)
Y=exp(4.316+pclass+sex-0.038*age-0.332*sibsp+embarked)/(1+exp(4.316+pclass+sex-0.038*age-0.332*sibsp+embarked))
Y=exp(4.316-2.069-2.633-0.038*25-1.471)/(1+exp(4.316-2.069-2.633-0.038*25-1.471))=exp(-2.807)/(1+exp(-2.807))=0.057, 5.7%
Prediction
Evaluation
If pred_logit>0.5 -> Yes
Create val
Pivot->val