Introduction to Data Science
By
S.V.V.D.Jagadeesh
Sr. Assistant Professor
Dept of Artificial Intelligence & Data Science
LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING
S.V.V.D.Jagadeesh
Thursday, January 2, 2025
Previously Discussed Topics
LBRCE
IDS
At the end of this session, Student will be able to:
S.V.V.D.Jagadeesh
Thursday, January 2, 2025
Session Outcomes
LBRCE
IDS
1 Feature engineering and model selection
2 Training the model
3 Model validation and selection
4 Applying the trained model to unseen data�
S.V.V.D.Jagadeesh
Thursday, January 2, 2025
Machine Learning Modeling Process
LBRCE
IDS
S.V.V.D.Jagadeesh
Thursday, January 2, 2025
Feature Engineering And Model Selection
LBRCE
IDS
S.V.V.D.Jagadeesh
Thursday, January 2, 2025
Feature Engineering And Model Selection
LBRCE
IDS
S.V.V.D.Jagadeesh
Thursday, January 2, 2025
Feature Engineering And Model Selection
LBRCE
IDS
S.V.V.D.Jagadeesh
Thursday, January 2, 2025
Training the Model
LBRCE
IDS
S.V.V.D.Jagadeesh
Thursday, January 2, 2025
Training the Model
LBRCE
IDS
S.V.V.D.Jagadeesh
Thursday, January 2, 2025
Validating a Model
LBRCE
IDS
S.V.V.D.Jagadeesh
Thursday, January 2, 2025
Validating a Model
LBRCE
IDS
■ Dividing your data into a training set with X% of the observations and keeping the rest as a holdout data set (a data set that’s never used for model creation)—This is the most common technique.
■ K-folds cross validation—This strategy divides the data set into k parts and uses each part one time as a test data set while using the others as a training data set.
This has the advantage that you use all the data available in the data set.
S.V.V.D.Jagadeesh
Thursday, January 2, 2025
Validating a Model
LBRCE
IDS
■ Leave-1 out—This approach is the same as k-folds but with k=1.
You always leave one observation out and train on the rest of the data.
This is used only on small data sets, so it’s more valuable to people evaluating laboratory experiments than to big data analysts.
■ Another popular term in machine learning is regularization.
When applying regularization, you incur a penalty for every extra variable used to construct the model.
S.V.V.D.Jagadeesh
Thursday, January 2, 2025
Validating a Model
LBRCE
IDS
S.V.V.D.Jagadeesh
Thursday, January 2, 2025
Predicting New Observations
LBRCE
IDS
S.V.V.D.Jagadeesh
Thursday, January 2, 2025
Summary
LBRCE
IDS