Machine Learning With Datasets
Ai4Ga
Unit 3.4
1
What are datasets?
a collection of data. In the case of tabular data, a dataset corresponds to one or more tables, where every column of a table represents a particular feature, and each row corresponds to an example of the dataset.
2
Jeans dataset from AI Lab.
Where do datasets come from?
Scientists, companies, engineers, and data scientist
For, example data can come from
Data can come from almost anywhere and take any form.
3
Fun fact: Data is plural.
In Latin, data is the plural of datum. In specialized scientific fields, it is also treated as a plural in English. Therefore, we say data were collected and classified and not data was collected and classified
Phenomenon (singular),
An observable fact or event that occurs in a natural or designed system, especially one whose cause or explanation is in question. Phenomena (plural)
Orientation to AI Lab
& Its ML Pipeline
4
Use this link to access AI Lab
Step 1: Selecting a Dataset
AI Lab has a lot of different datasets to choose from that have already been preloaded into the tool. For our practice activity, select the Jeans Measurements dataset.
5
Identifying a Question
After we identify our dataset, we need to review its features to figure out what questions we can answer with it.
6
Jeans dataset from AI Lab.
What features best predict the price of jeans?
7
Our QUESTION:
Selecting a feature we would like to predict
8
Label of the Feature we want to predict: price in dollars
9
Label of the Feature we want train our predictor model with: �Let’s try Brand, Style, Mens or Womens
10
Let’s train our model
11
How is our model trained?
Step 2 - A ML algorithm is use to find patterns in the training data. Patterns are statistical relationships called correlations between the value to be predicted and the features selected for the training.
Step 3 - After the training is complete and the model is created, it needs to be tested for accuracy using the test set.
Step 1 - Our dataset is first split into two sets by AI lab
This is a random process so your results may vary based on what data was included in the training or the test set.
12
Testing the Model - Press Continue to skip this animation!
13
Evaluating the Accuracy of the Model
This selection and test had an accuracy of 75%…Let’s see how well it did.
14
Explore Correct Predictions
Notice that correct - doesn’t mean it has to exactly match. There is a range of +/- 5%.
15
Explore Incorrect Predictions
Wow! Look at very different these predictions are. They are off by way more that +/- 5%
16
Note - Your results may be different than the slides, even though you picked the same features and data set. Remember, the AI trains with 90% of the data, and tests with the remaining 10%. Which numbers are included in each section can change the result.
17
Now you try!
Click Try Again in the bottom left corner.
Keep the prediction set to the price of jeans, and select two or three features you think will give the most accurate results. Place your answers in the table below.
18
| Feature for category 1 | Category 2 | Category 3 | Percent Correct |
Attempt 1 | Brand | | | 75 |
Attempt 2 | | | | |
Attempt 3 | | | | |
Attempt 4 | | | | |
Attempt 5 | | | | |
A solution to the Jeans dataset
Brand, Cotton Contents, Mens or Womens
19