BASIC MACHINE LEARNING I
WHAT IS MACHINE LEARNING
FRAMEWORK OF MACHINE LEARNING
Training data
Training
Model
Testing data
New query
Output
Black Box
TRAINING AND TESTING
MACHINE LEARNING TYPES
MACHINE LEARNING TYPES
MACHINE LEARNING TYPES
MACHINE LEARNING TYPES
Patient classification
A predictor assigns a new case to an existing group
Protein expression
Train
Predictor
Use predictor to classify new cases
Predictor
Proteins that best distinguish the groups of patients
Use predictor to find a prognostic signature
Protein expression
Train
Predictor
Protein A
Protein B
Predictor
INPUT DATA
DATA BALANCE V.S IMBALANCE
EXAMPLE OF IMBALANCED DATASET
USING GOOD EVALUATION METRICS
USING GOOD EVALUATION METRICS
| Patients/P | Normal/N |
Total | 10 | 990 |
Prediction | 1 | 999 |
True positive/negative | 1 | 990 |
False positive/negative | 0 | 9 |
Sensitivity | 1/10 = 10% | |
Specificity | 990/990 = 100% | |
Accuracy | 1+990/10+990 = 99.1% | |
Precision | 1/1 = 100% | |
Recall | 1/10 = 10% | |
F1 Score | 2x1/2x1+0+9 = 18.2% | |
OVER-SAMPLING (UP SAMPLING)
UNDER-SAMPLING (DOWN SAMPLING)
FEATURE SELECTION
FEATURE FILTER
FEATURE WRAPPER
FEATURE EMBEDDING
FEATURE WEIGHT
HYPERPARAMETERS
HYPERPARAMETER OPTIMIZATION
Fruit | Shape | Size | Smell | Color | Score |
| Circle | Middle | Scent | Red | |
| Ellipse | Big | Scent | Yellow | |
| Ellipse | Big | odor | Yellow | |
| Circle | Middle | No smell | Green | |
| Circle | Small | No smell | Purple | |
HYPERPARAMETER OPTIMIZATION
Fruit | Shape | Size | Smell | Color | Score |
| 1 | 2 | 2 | 1 | 6 |
| 2 | 3 | 2 | 3 | 10 |
| 2 | 3 | 3 | 3 | 11 |
| 1 | 2 | 1 | 2 | 6 |
| 1 | 1 | 1 | 4 | 7 |
HYPERPARAMETER OPTIMIZATION
Fruit | Shape | Size | Smell | Color | Score |
| 1 x a | 2 x b | 2 x c | 1 x d | 6 |
| 2 x a | 3 x b | 2 x c | 3 x d | 10 |
| 2 x a | 3 x b | 3 x c | 3 x d | 11 |
| 1 x a | 2 x b | 1 x c | 2 x d | 6 |
| 1 x a | 1 x b | 1 x c | 4 x d | 7 |
HYPERPARAMETER OPTIMIZATION
Fruit | Shape | Size | Smell | Color | Score |
| 1 x 1 | 2 x 2 | 2 x 1 | 1 x 0.5 | 7.5 |
| 2 x 1 | 3 x 2 | 2 x 1 | 3 x 0.5 | 11.5 |
| 2 x 1 | 3 x 2 | 3 x 1 | 3 x 0.5 | 12.5 |
| 1 x 1 | 2 x 2 | 1 x 1 | 2 x 0.5 | 7 |
| 1 x 1 | 1 x 2 | 1 x 1 | 4 x 0.5 | 6 |
GRID SEARCHING
RANDOM SEARCHING
BAYESIAN SEARCHING
HYPERPARAMETER OPTIMIZATION
BAYESIAN OPTIMIZATION
ID | Age | Gender | Income | Mariage | Children | Buy |
1 | 33 | M | 5.3W | Y | Y | Y |
2 | 37 | F | 4.5W | Y | Y | Y |
3 | 22 | F | 3W | N | N | N |
4 | 35 | M | 7.8W | Y | Y | Y |
5 | 60 | M | 6W | N | Y | N |
6 | 63 | F | 5.5W | Y | N | N |
7 | 55 | F | 6W | Y | Y | N |
8 | 18 | F | 2.8W | Y | N | N |
9 | 21 | M | 3.5W | N | N | N |
10 | 46 | F | 7W | N | N | N |
BAYESIAN OPTIMIZATION
ID | Age | Gender | Income | Mariage | Children | Buy |
1 | 33 | M | 5.3W | Y | Y | Y |
2 | 37 | F | 4.5W | Y | Y | Y |
3 | 22 | F | 3W | N | N | N |
4 | 35 | M | 7.8W | Y | Y | Y |
5 | 60 | M | 6W | N | Y | N |
6 | 63 | F | 5.5W | Y | N | N |
7 | 55 | F | 6W | Y | Y | N |
8 | 18 | F | 2.8W | Y | N | N |
9 | 21 | M | 3.5W | N | N | N |
10 | 46 | F | 7W | N | N | N |
11 | 38 | M | 8.3W | Y | Y | |
3/10 ?
BAYESIAN OPTIMIZATION
ID | Age | Gender | Income | Mariage | Children | Buy |
1 | 33 | M | 5.3W | Y | Y | Y |
2 | 37 | F | 6.5W | Y | Y | Y |
4 | 35 | M | 7.8W | Y | Y | Y |
3 | 22 | F | 3W | N | N | N |
5 | 60 | M | 5W | N | Y | N |
6 | 63 | F | 5.5W | Y | N | N |
7 | 55 | F | 6W | Y | Y | N |
8 | 18 | F | 2.8W | Y | N | N |
9 | 21 | M | 3.5W | N | N | N |
10 | 46 | F | 7W | N | N | N |
11 | 38 | M | 8.3W | Y | Y | Maybe Y |
3/10 ?
COST-SENSITIVE LEARNING
| Actual class | ||
Predict class | | Positive class | Negative class |
Positive class | TP = 990 | FP = 9 | |
Negative class | FN = 0 | TN = 1 | |
| Actual class | ||
Predict class | | Positive class | Negative class |
Positive class | TP = 990 Cost | FP = 9 Cost | |
Negative class | FN = 0 Cost | TN = 1 Cost | |
ENSEMBLE LEARNING
STACKING
BAGGING
BOOSTING
BOOSTING
A boy invite a girl for a date. She asks her friends that she should date with him or not. Afterwards, she decide to date with the boy (Stacking).
Several months past, the boy wants to buy a present to the girl, but he has no idea what present is the good one. So, he chooses some items – a ring, a necklet and a bag, and asks the options from this friends. Since most of them choose a bag, he buy a great bag to the girl (Bagging).
The girl hates the bag and feels the boy doesn’t love her. So she breaks up with the boy. Because of this, the boy will not buy a bag for a present to his girlfriend in the future (Boosting).
BAGGING V.S STACKING V.S BOOSTING
MULTIPLE CLASSIFIERS
Conditions:
ADABOOST
Initial stage
First classifier
Error rate = 0.25
Updated weight
ADABOOST
ADABOOST
OVERFITTING
HOW TO SOLVE OVERFITTING
REGULARIZATION
DROPOUT
VALIDATION
HOLDOUT WITH TRAIN-TEST
K-FOLD CROSS VALIDATION
LEAVE-ONE-OUT CROSS VALIDATION
LEAVE-ONE-GROUP-OUT CROSS VALIDATION
NESTED CROSS VALIDATION
TIME SERIES CROSS VALIDATION
COMPARING MODELS
WILCOXON SIGNED-RANK TEST
MCNEMAR’S TEST
5X2CV PAIRED T-TEST