ABCDEFGHIJKLMNOPQRSTUVWXYZAAAB
1
My Data Cleaning Checklistdataset - DailyActivity_mergeddataset - sleepDay_merged
2
Dataset Back Up Dataset DuplicatedDataset Duplicated
3
Ensure Data is in Tabular Format - Organized in rows and columns.
4
Duplicates: Did you remove duplicates in spreadsheets using the Remove Duplicates function or DISTINCT in SQL?
5
Did you Check for Irrelvant Data- TrackerDistance Column removed - same as TotalDistance Column
- LoggedActivitiesDistance removed as all 0
6
Null data: Did you search for NULLs using conditional formatting and filters?Used Conditional Formatting to Color Empty Cells - None ObservedUsed Conditional Formatting to Color Empty Cells - None Observed
7
Remove Extra Spaces: Did you remove any extra spaces or characters using the TRIM function?N/A - Numerical Data with no extra spacesN/A - Numerical Data with no extra spaces
8
Misspelled words: Did you locate all misspellings?N/A - Numerical Data N/A - Numerical Data
9
Mistyped numbers: Did you double-check that your numeric data has been entered correctly?
10
Inconsistent CapitilizationN/A - Numerical DataN/A - Numerical Data
11
Incorrect PunctuationN/A - Numerical DataN/A - Numerical Data
12
Mismatched data types: Did you check that numeric, date, and string data are typecast correctly?
13
Messy (inconsistent) strings: Did you make sure that all of your strings are consistent and meaningful?
14
Messy (inconsistent) date formats: Did you format the dates consistently throughout your dataset?Dates are formatted to mm/dd/yyyy
15
Misleading variable labels (columns): Did you name your columns meaningfully?
16
Truncated data: Did you check for truncated or missing data that needs correction?
17
Business Logic: Did you check that the data makes sense given your knowledge of the business?
18
Verifying Data Cleaning Efforts:
- In the verification process go back to your original unclean data set, comparing it to what you have now. Review the dirty data and try to identify any common problems.
- This is also the time to notice if anything sticks out to you as suspicious or potentially problematic in your data.
- Step back, take a big picture view, and ask yourself, do the numbers make sense within the context of our business analysis.
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100