1 of 24

Titanic - Machine Learning from Disaster

2 of 24

Meet the workshop leads

Sissy He

Second Year SE

Aisha Khatun

Masters CS

Vivian Guo

Second Year CFM

Sabina

Gorbachev

Third Year CS/BBA

Molly Xu

Third Year�CS

3 of 24

Problem Introduction

Brief intro to the Titanic Challenge

4 of 24

What is this Challenge about?

The Titanic

  • Sank on April 15, 1912, resulting in the death of 1502 out of 2224 passengers and crew

  • It seems like some groups of people were more likely to survive than others

5 of 24

What is this Challenge about?

The Challenge

  • Use Titanic passenger data to predict who will survive and who will die

The Data

  • Passenger information (ie. name, age, price of ticket, etc)
  • .csv files

6 of 24

Technologies

7 of 24

Poll:

Are you familiar with Python?

  1. Never Heard About it
  2. Yes, a bit
  3. Use it all the time

8 of 24

Python

PANDAS

  • Easily import and work with large amounts of data stored in Dataframes (tables)

SCIKIT LEARN

  • Tools for applying machine learning principles

9 of 24

Example Code

PANDAS

import pandas as pd

train_data = pd.read_csv("...")

SCIKIT LEARN

from sklearn. . .import . . .

10 of 24

What is AI and ML? (And the difference)

11 of 24

AI

(Artificial

Intelligence)

It's the quest to build machines that can reason, learn, and act intelligently, and it has barely begun. It covers the latest advances in machine learning, neural networks, and robots. (MIT)

12 of 24

13 of 24

Machine Learning

Machine learning is a subfield of artificial intelligence that gives computers the ability to learn without explicitly being programmed (MIT)

  • It is like giving the machine the learning ability like you have - they can learn from “classes” (data) and solve the homework that they haven’t seen in class! (like you did ;))

14 of 24

Data Exploration

15 of 24

Data Exploration

What kinds of information do each of the features have?

Let us explore a few features:

  • Sex
  • Pclass
  • Age

16 of 24

Poll:

How many rows are there with Pclass = 3?

  • 355
  • 184
  • 173

17 of 24

Prediction

18 of 24

Prediction

Imagine you are taken back to time and are on the Titanic ship. You see lots of people around you.

Can you look at someone, and predict whether or not they will survive the Titanic sinking?

Other use cases:

  • Is a comment/tweet hateful?
  • Is this an image of a Cat or a Dog?
  • Is the customer going to keep using your product?

19 of 24

Prediction

Lets predict survival based on only one feature.

Try Yourself

Survival prediction by Parch

Demo

Survival prediction by Gender

20 of 24

Poll:

What is the accuracy if we only use Pclass for predicting survival?

  1. ~50%
  2. ~60%
  3. ~70%

21 of 24

Decision Trees

22 of 24

Decision Trees & Random Forest

23 of 24

Poll:

Is a Random Forest with 150 estimators and 5 max_depth better or worse?

  • Better
  • Worse

24 of 24

Thank You For Listening! :)