1
Wine Quality Dataset
Individual project – Beginner group
Yujin Kim (01815358)
Contents
2
1. Introduction to Wine Quality Dataset
Challenges
Goal
- Creating a classification model to predict wine quality
3
1. Introduction to Dataset
4
1. Introduction to Dataset
5
1 | Fixed acidity | 7 | Total sulfur dioxide |
2 | Volatile acidity | 8 | Density |
3 | Citric acid | 9 | pH |
4 | Residual sugar | 10 | Sulphates |
5 | Chlorides | 11 | Alcohol |
6 | Free sulfur dioxide | 12 | Quality (score between 0 to 10) |
2. Data preprocessing
6
2. Data preprocessing
7
2. Data preprocessing
: 1599 -> 1562 rows 37 rows are dropped
8
2. Data preprocessing with visualization
9
2. Data preprocessing with visualization
10
2. Data preprocessing with visualization
11
2. Data preprocessing with visualization
12
2. Data preprocessing with visualization
13
2. Data preprocessing with visualization
-> Residual sugar, free sulfur dioxide, pH
14
2. Data preprocessing
15
2. Data preprocessing
- First, check the skewness of each column
16
2. Data preprocessing
-> log transformation
17
2. Data preprocessing
18
3. Modeling
19
There are much more normal wines then excellent or poor wines
3. Modeling Artificial Neural Network(ANN)
- Seed function
: used to save the state of a random function, so that it can generate same random numbers on multiple executions of the code on the same machine or on different machines
20
3. Modeling – ANN1 model
- ANN1 model
- 7 hidden layers
21
3. Modeling
22
3. Modeling
23
3. Modeling
- At first, set the epoch to 1000
24
3. Modeling
25
3. Modeling
26
3. Modeling
27
3. Modeling
28
3. Modeling
29
3. Modeling
30
3. Modeling – ANN2 model
- Decreased the hidden layer from 7 layers to 4 layers
31
3. Modeling – ANN2 model
32
51 -> 55
3. Modeling – ANN3 model
- Decrease more hidden layer from 4 layers to 2 layers
33
3. Modeling – ANN3 model
34
55 -> 60
3. Modeling – ANN3
35
0.363
3. Modeling – ANN4 model
36
3. Modeling – ANN4 model
37
0.363
3. Modeling – ANN5 model
38
0.275
3. Modeling – ANN6 model
39
3. Modeling – ANN6 model
- But F1 score still very bad
40
0.363
3. Modeling – ANN7 model
41
0.45
3. Modeling – RandomForest Classifier Model
42
0.58
3. Modeling - Gaussian Naïve Bayes Model
43
0.55
4. Evaluation
44
Model | Method | Accuracy | F1-score |
ANN1 | 7 hidden layers | 51.25 | |
ANN2 | 4 hidden layers | 55.00 | |
ANN3 | 2 hidden layers | 60.62 | |
ANN4 | Giving Weights | 58.75 | 0.36 |
ANN5 | Reciprocal Weights | 58.75 | 0.27 |
ANN6 | SMOTE oversampling | 60.52 | 0.36 |
ANN7 | Stratified sampling | 57.31 | 0.45 |
Random Forest | | 63.05 | 0.58 |
Gaussian NB | | 56.66 | 0.55 |