ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
CS 5/7394 - Spring 2022Due Date:
April 11, 2022 by 11:59pm.
2
Project 3
3
Uber Fares DatasetHow Many UpvotesReddit PopularityWake County Real Estate PricesFashion MNISTCIFAR-10
4
Clean the Data (deal with missing values)
5
Use an Ordinal Encoder
6
Use a One Hot Encoder
7
Implement a custom transformer
8
Scale/normalize/standardize features using sklearn.preprocessing
9
10
Use SGDClassifier
11
Use sklearn.linear_model.LinearRegression
12
Use sklearn.tree.DecisionTreeRegressor
13
Use sklearn.ensamble.RandomForestClassifier
14
Use sklearn.neighbors.KNeighborsClassifier
15
use OvO or OvR classifier
16
17
Use k-fold Cross Validation (cross_val_score)
18
Use StratifiedKFold cross validation
19
20
Use sklearn.metrics.mean_squared_error and at least one other sklearn.metrics option to evaluate model performance
21
Generate a confusion matrix
22
Generate a ROC Curve or related
23
Use Grid Search CV or RandomizedSearch CV to tune hyperparameters for a model
24
25
Use an Ensamble of Methods
26
27
Evaluate your system on the Test Data
28
29
Create a single pipeline that does full process from data preparation to final prediction.
30
31
32
33
34
35
Data Set Links:
36
Min tasks for a dataset: 5 tasks
37
Uber Fares Dataset
https://www.kaggle.com/datasets/yasserh/uber-fares-dataset
5394Min 2 datasets
38
How Many Upvotes
https://www.kaggle.com/datasets/umairnsr87/predict-the-number-of-upvotes-a-post-will-get
7394Min 3 datasets
39
Reddit Popularity
https://www.kaggle.com/datasets/kashyapgohil/predicting-reddit-post-popularity-through-comments
40
Wake County Real Estate Price Prediction
https://www.kaggle.com/datasets/nkanda/wake-county-housing-nc?select=WakeCountyHousing.csv
41
Fashion MNIST
https://github.com/zalandoresearch/fashion-mnist
42
CIFAR-10
http://www.cs.toronto.edu/~kriz/cifar.html
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100