ABCDEFGHIJKLMNOPQRSTUVWXYZAA
1
DA5030 Project Rubric1350Name:Section/Term:
2
(note that it is impossible to get all the points as many are mutually exclusive and depend on the chosen data set; the project is graded out of 100 points, so it is possible to get more than 100%); be sure to point us to your code where the work has been done
3
Data AcquisitionPointsPointsCode Block/Lines (required)Comments/Explanations (required/justify your work)
4
acquisition of data (e.g., CSV or flat file)2
5
other:
6
7
Data ExplorationPoints
8
exploratory data plots5
9
detection of outliers5
10
correlation/collinearity analysis5
11
other:
12
13
Data Cleaning & ShapingPoints
14
data imputation5
15
proper encoding of data for algorithms used5
16
normalization/standardization of feature values3
17
feature engineering: dummy codes2
18
feature engineering: PCA5
19
feature engineering: new derived features3
20
other:
21
22
Model Construction & EvaluationPoints
23
creation of training & validation subsets2
24
construction of at least three related models15
25
evaluation of fit of models with holdout method3
26
evaluation with k-fold cross-validation5
27
tuning of models5
28
comparison of models5
29
interpretation of results/prediction with interval5
30
use of bagging with homogeneous learners5
31
construction of ensemble model10
32
other:
33
34
Data Mining ProcessPoints
35
follows CRISP-DM5
36
explanation of steps; good journaling & coding5
37
peer reviews5
38
video presentation5
39
40
Model DeploymentPoints
41
Interactive Dashboard (e.g., Shiny)10
42
Other Model Deployment for Actual Use10
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100