1 of 11

CSE 163

Machine Learning�

Suh Young Choi�

🎶 Listening to: Mariusz Duda

💬 Before Class: Do you have a favorite tree?

2 of 11

Last Time

Statistics 101
Hypothesis testing
Best practices for research

This Time

Machine Learning

Terminology
Types of ML

ML Code (scikit-learn)
Decision Trees

3 of 11

Updates & Reminders

Reading Assignment 4 is out on Canvas/Hypothesis now

Due Tuesday, 2/17

THA 4 is due tomorrow (2/12)

If you need to prioritize anything, focus on the Creative Component!

Checkpoints and Reading Assignments will open for submissions after initial grades are released

Suh Young will not be here next week!

TAs are hosting a watch party of pre-recorded lectures instead

4 of 11

Checking in

Advanced material / Out-of-scope deductions

The expectation is that submitted work is aligned with course material and guidelines
The more we see it in your submission, the larger the deduction becomes
Unresolved cases result in a 0 + forfeiture of resubmission ☹

Resubmission Policy

Two resub cycles are open after the initial round of feedback per assessment
You can resubmit one technical component per cycle!
Exact dates for final submission days will be posted to Ed

5 of 11

Terms

Machine learning

Model
Machine learning algorithm

Data

Training Set
Example
Feature
Label

Types of ML

Regression
Classification

6 of 11

Features Matter

There exists hundreds of different model types, but the most important thing is having good features to describe your data

Garbage In => Garbage Out

7 of 11

Decision Tree

Like a series of GIANT if/else branches! (Hence… tree)

8 of 11

Code Recap�(classification)

General ML pipeline (at least to start)

For classification tasks

# Separate data

features = data.loc[:, data.columns != 'target']

labels = data['target']

# Create and train model

model = DecisionTreeClassifier()

model.fit(features, labels)

# Predict on some data

predictions = model.predict(features)

# Assess accuracy

accuracy_score(labels, predictions)

9 of 11

Code Recap�(regression)

General ML pipeline (at least to start)

For regression tasks

# Separate data

features = data.loc[:, data.columns != 'target']

labels = data['target']

# Create and train model

model = DecisionTreeRegressor()

model.fit(features, labels)

# Predict on some data

predictions = model.predict(features)

# Assess MSE

mean_squared_error(labels, predictions)

10 of 11

Group Work:

Best Practices

When you first working with this group:

Introduce yourself!
If possible, angle one of your screens so that everyone can discuss together

Tips:

Starts with making sure everyone agrees to work on the same problem
Make sure everyone gets a chance to contribute!
Ask if everyone agrees and periodically ask each other questions!
Call TAs over for help if you need any!

11 of 11

Next Time

Evaluating Machine Learning

Before Next Time

Complete Lesson 15

Remember not for points, but do go towards Weekly Tokens

THA 4 due tomorrow night!
Get started on EDA or Milestone!