1 of 11

CSE 163

Machine Learning�

Suh Young Choi�

🎶 Listening to: Mariusz Duda

💬 Before Class: Do you have a favorite tree?

2 of 11

Last Time

  • Statistics 101
  • Hypothesis testing
  • Best practices for research

This Time

  • Machine Learning
    • Terminology
    • Types of ML
  • ML Code (scikit-learn)
  • Decision Trees

2

3 of 11

Updates & Reminders

Reading Assignment 4 is out on Canvas/Hypothesis now

    • Due Tuesday, 2/17

THA 4 is due tomorrow (2/12)

    • If you need to prioritize anything, focus on the Creative Component!

Checkpoints and Reading Assignments will open for submissions after initial grades are released

Suh Young will not be here next week!

    • TAs are hosting a watch party of pre-recorded lectures instead

3

4 of 11

Checking in

Advanced material / Out-of-scope deductions

  • The expectation is that submitted work is aligned with course material and guidelines
  • The more we see it in your submission, the larger the deduction becomes
  • Unresolved cases result in a 0 + forfeiture of resubmission ☹

Resubmission Policy

  • Two resub cycles are open after the initial round of feedback per assessment
  • You can resubmit one technical component per cycle!
  • Exact dates for final submission days will be posted to Ed

4

5 of 11

Terms

  • Machine learning
    • Model
    • Machine learning algorithm
  • Data
    • Training Set
    • Example
    • Feature
    • Label
  • Types of ML
    • Regression
    • Classification

5

6 of 11

Features Matter

  • There exists hundreds of different model types, but the most important thing is having good features to describe your data
    • Garbage In => Garbage Out

6

7 of 11

Decision Tree

  • Like a series of GIANT if/else branches! (Hence… tree)

7

8 of 11

Code Recap�(classification)

General ML pipeline (at least to start)

  • For classification tasks

8

# Separate data

features = data.loc[:, data.columns != 'target']

labels = data['target']

# Create and train model

model = DecisionTreeClassifier()

model.fit(features, labels)

# Predict on some data

predictions = model.predict(features)

# Assess accuracy

accuracy_score(labels, predictions)

9 of 11

Code Recap�(regression)

General ML pipeline (at least to start)

  • For regression tasks

9

# Separate data

features = data.loc[:, data.columns != 'target']

labels = data['target']

# Create and train model

model = DecisionTreeRegressor()

model.fit(features, labels)

# Predict on some data

predictions = model.predict(features)

# Assess MSE

mean_squared_error(labels, predictions)

10 of 11

Group Work:

Best Practices

When you first working with this group:

  • Introduce yourself!
  • If possible, angle one of your screens so that everyone can discuss together

Tips:

  • Starts with making sure everyone agrees to work on the same problem
  • Make sure everyone gets a chance to contribute!
  • Ask if everyone agrees and periodically ask each other questions!
  • Call TAs over for help if you need any!

10

11 of 11

Next Time

  • Evaluating Machine Learning

Before Next Time

  • Complete Lesson 15
    • Remember not for points, but do go towards Weekly Tokens
  • THA 4 due tomorrow night!
  • Get started on EDA or Milestone!

11