3 of 47

Methodology:

Collect data from the SpaceX API and Wikipedia
Clean the data for analysis
Perform exploratory data analysis and create interactive visualizations
Train and test a high-accuracy machine learning model for future SpaceY predictions

Summary of results:

SpaceX’s success came gradually over time, with the most common launch outcome being to abort
The map and dashboard show that KSC LC-39A and more recent SpaceX booster designs were most successful
Four types of models (all with 83.3% test accuracy) have been developed for future testing and use

Executive Summary

4 of 47

Introduction

We are SpaceY

Our goal: develop a private space program that can rival and eventually surpass SpaceX

Main question of this project:

How do we emulate the successes and learn from the failures of SpaceX?

How we will accomplish this:

Perform exploratory data analysis: which boosters/orbits/payload masses/landing sites are most successful for SpaceX?
Create dashboards to extract further insights from SpaceX’s public data
Design machine learning models trained on SpaceX data to predict the success of our own missions in the future

6 of 47

Executive Summary

Data collection methodology:

Two data collection methods: SpaceX API calls and Web scraping Wikipedia

Data wrangling:

Relevant attributes extracted, ‘Landing Outcome’ created, one-hot encoding

Perform exploratory data analysis (EDA) using data visualization and SQL
Create interactive visual analytics using Folium and Plotly Dash
Implement predictive analysis using classification models

train_test_split, grid search to optimize parameters, test accuracy model comparison

Methodology

7 of 47

Two data sources:

SpaceX API

Contains official launch metadata from SpaceX
Data requested using a REST API call in Python
Then, data loaded into a pandas dataframe and parsed as needed

Falcon 9 and Falcon Heavy Wikipedia Page

HTTP GET method used to create a BeautifulSoup object from the webpage
Table of launch records extracted from the HTML contents of the object

Data Collection

8 of 47

API V4 Data requested using a REST API call
Relevant information extracted from the raw output
Information processed into a pandas dataframe
Null values in PayloadMass column replaced with the mean of the non-null values
Notebook Link (github.com)

Data Collection – SpaceX API

SpaceX API V4

requests.get(spacex_url)

dataframe_falcon9

9 of 47

Wikipedia data obtained using HTTP request
HTML response converted to BeautifulSoup object
Launch records table extracted from object
HTML table converted into pandas dataframe

Notebook Link (github.com)

Data Collection - Scraping

Wikipedia Page

requests.get(static_url)

table = html_tables[2]

dataframe

10 of 47

Preliminary data wrangling steps (from data collection process):

Irrelevant columns removed from both datasets
Null values of ‘PayloadMass’ updated

Number of launches at each site obtained
Number and occurrence of each orbit calculated
Landing Outcomes column (1 if success, 0 if failure) created
Overall success rate of SpaceX missions determined

Notebook Link (github.com)

Data Wrangling

11 of 47

Charts plotted:

Flight Number vs. Launch Site
Payload Mass vs. Launch Site
Success Rate by Orbit Type
Flight Number vs. Orbit Type
Payload Mass vs. Orbit Type
Success Rate by Year
These charts contain insights about SpaceX and the data as a whole
Notebook Link (github.com)

EDA with Data Visualization

12 of 47

Example of Queries performed:

List of all launch sites
Total payload mass of boosters launched by NASA
Date of first successful ground pad landing
List of all booster versions that have carried the maximum payload mass
Rank the landing outcomes between June 2010 and March 2017 from most to least common

These queries provide context to the dataset and SpaceX’s launches
Notebook Link (github.com)

EDA with SQL

13 of 47

Folium Map Objects Used:

Marker

Clearly marks points of interest on the map

MarkerCluster

Allows closely-clustered launch outcomes to be seen clearly

Polyline (Distance Line)

Shows calculated distance and straight-line path between two points

Notebook Link (github.com)

Build an Interactive Map with Folium

(closed)

(open)

14 of 47

Pie Chart:

Visualizes success rates for all sites or for a specific site
Customizable via the launch site dropdown

Scatter Plot:

Visualizes successes and failures by payload mass and booster version
Customizable via the dropdown and the payload range slider

Dashboard Python Code (github.com)

Build a Dashboard with Plotly Dash

15 of 47

Data split into training and test subsets (train_test_split)
Models chosen (logistic regression, support vector machine, decision tree, k-nearest neighbors)
Grid search applied to each model (finds best hyperparameters for accuracy)
Optimized models fit with training data and tested with test data
Models evaluated using test accuracy and confusion matrices created

Notebook Link (github.com)

Predictive Analysis (Classification)

16 of 47

Exploratory data analysis results:

SpaceX became more successful over time
AND more adventurous (orbit variety, increased payload mass, etc…)

Interactive analytics demo:

Predictive analysis results

All four models obtained the exact same test accuracy (83.33%)

Summary of Results

18 of 47

Large flight numbers (which indicate more recent flights) have a high success rate and often take off from Florida (CCAFS or KSC)

Flight Number vs. Launch Site

19 of 47

VAFB has a maximum payload mass under 10000 kg
Higher payload masses have a higher success rate

Payload vs. Launch Site

20 of 47

Success Rate vs. Orbit Type

ES-L1, GEO, HEO, and SSO orbit types have the highest success rate
SO is the orbit type with the lowest success rate

21 of 47

Flight Number vs. Orbit Type

As flight number increases, the variety of orbit types attempted increase.

22 of 47

Most of the orbit types are used for specific payload masses (SSO for small masses, VLEO for large masses, etc…).

Payload vs. Orbit Type

23 of 47

Success rate increases over time.

Launch Success Yearly Trend

24 of 47

All Launch Site Names

Names of the launch sites:

25 of 47

The first 5 records where launch sites begin with `CCA`:

Launch Site Names Begin with 'CCA'

26 of 47

The total payload carried by boosters from NASA:

Total Payload Mass

27 of 47

The average payload mass carried by booster version F9 v1.1:

Average Payload Mass by F9 v1.1

28 of 47

The first successful landing outcome on ground pad:

First Successful Ground Landing Date

29 of 47

Boosters which have successfully landed on a drone ship and had a payload mass greater than 4000 kg but less than 6000 kg:

Successful Drone Ship Landing with Payload between 4000 and 6000

30 of 47

Count of successful and failed mission outcomes:

Total Number of Successful and Failure Mission Outcomes

31 of 47

Boosters that have carried the maximum payload mass:

Boosters Carried Maximum Payload

32 of 47

2015 failed drone ship landing outcomes:

2015 Launch Records

33 of 47

Landing outcomes ranking between 2010-06-04 and 2017-03-20:

Rank Landing Outcomes Between 2010-06-04 and 2017-03-20

Note: the most common landing outcome is “No attempt”

35 of 47

Folium Launch Sites Map

The four SpaceX launch sites are shown here on this map.
Three of the four sites are next to each other along the Florida coastline.

36 of 47

Folium VAFB Clusters Map

This map shows the cluster of launch outcomes for VAFB LC SLC 4E.
The cluster makes it easy to see that 4 out of 10 VAFB landings have been successful.

37 of 47

Folium Distance Line Map

This map shows the closest straight-line distance between CCAFS SLC-40 and the coastline.

39 of 47

Total Successful Launches by Launch Site

KSC LC-39A had the highest percent of successful launches.
Meanwhile, CCAFS SLC-40 had the lowest percent of successful launches.

40 of 47

Launch Site with the Highest Success Rate

In addition to having the most successful launches, KSC LC-39A had the highest success rate among launch sites.

41 of 47

Payload Mass vs. Success (0-7000 kg)

Booster versions FT, B4, and B5 (the 3 newest booster versions) make up most of the successes.
In turn, v1.0 and v1.1 (the 2 oldest booster versions) make up many of the failures.

43 of 47

Classification Accuracy

All four models have identical testing accuracy.

44 of 47

All four models generate the same confusion matrix.
Important to note: the models have a 20% false positive rate vs. a 0% false negative rate.

Confusion Matrix

45 of 47

SpaceX’s success came gradually, with the most common launch outcome being to abort
SpaceX has increased mission complexity over time, especially when it comes to orbit types and payload masses
3 of SpaceX’s 4 launch sites (including their most successful site) are near each other and the Florida coastline
KSC LC-39A and the more recent SpaceX booster designs are the most successful landing site and booster versions respectively
All the classification models created are equally proficient in terms of testing accuracy

Conclusions

1 of 47

2 of 47