1 of 47

Ryan Eck

Last updated: 2/17/2023

2 of 47

2

Table of Contents

3 of 47

3

  • Methodology:
    1. Collect data from the SpaceX API and Wikipedia
    2. Clean the data for analysis
    3. Perform exploratory data analysis and create interactive visualizations
    4. Train and test a high-accuracy machine learning model for future SpaceY predictions
  • Summary of results:
    • SpaceX’s success came gradually over time, with the most common launch outcome being to abort
    • The map and dashboard show that KSC LC-39A and more recent SpaceX booster designs were most successful
    • Four types of models (all with 83.3% test accuracy) have been developed for future testing and use

Executive Summary

4 of 47

4

Introduction

  • We are SpaceY
    • Our goal: develop a private space program that can rival and eventually surpass SpaceX
  • Main question of this project:
    • How do we emulate the successes and learn from the failures of SpaceX?
  • How we will accomplish this:
    • Perform exploratory data analysis: which boosters/orbits/payload masses/landing sites are most successful for SpaceX?
    • Create dashboards to extract further insights from SpaceX’s public data
    • Design machine learning models trained on SpaceX data to predict the success of our own missions in the future

5 of 47

5

Section 1

6 of 47

6

Executive Summary

  • Data collection methodology:
    • Two data collection methods: SpaceX API calls and Web scraping Wikipedia
  • Data wrangling:
    • Relevant attributes extracted, ‘Landing Outcome’ created, one-hot encoding
  • Perform exploratory data analysis (EDA) using data visualization and SQL
  • Create interactive visual analytics using Folium and Plotly Dash
  • Implement predictive analysis using classification models
    • train_test_split, grid search to optimize parameters, test accuracy model comparison

Methodology

7 of 47

Two data sources:

  1. SpaceX API
    • Contains official launch metadata from SpaceX
    • Data requested using a REST API call in Python
    • Then, data loaded into a pandas dataframe and parsed as needed
  2. Falcon 9 and Falcon Heavy Wikipedia Page
    • HTTP GET method used to create a BeautifulSoup object from the webpage
    • Table of launch records extracted from the HTML contents of the object

7

Data Collection

8 of 47

  1. API V4 Data requested using a REST API call
  2. Relevant information extracted from the raw output
  3. Information processed into a pandas dataframe
  4. Null values in PayloadMass column replaced with the mean of the non-null values
  5. Notebook Link (github.com)

8

Data Collection – SpaceX API

SpaceX API V4

requests.get(spacex_url)

dataframe_falcon9

9 of 47

  1. Wikipedia data obtained using HTTP request
  2. HTML response converted to BeautifulSoup object
  3. Launch records table extracted from object
  4. HTML table converted into pandas dataframe

  • Notebook Link (github.com)

9

Data Collection - Scraping

Wikipedia Page

requests.get(static_url)

table = html_tables[2]

dataframe

10 of 47

  • Preliminary data wrangling steps (from data collection process):
    • Irrelevant columns removed from both datasets
    • Null values of ‘PayloadMass’ updated
  • Number of launches at each site obtained
  • Number and occurrence of each orbit calculated
  • Landing Outcomes column (1 if success, 0 if failure) created
  • Overall success rate of SpaceX missions determined

  • Notebook Link (github.com)

10

Data Wrangling

11 of 47

Charts plotted:

  1. Flight Number vs. Launch Site
  2. Payload Mass vs. Launch Site
  3. Success Rate by Orbit Type
  4. Flight Number vs. Orbit Type
  5. Payload Mass vs. Orbit Type
  6. Success Rate by Year
  7. These charts contain insights about SpaceX and the data as a whole
  8. Notebook Link (github.com)

11

EDA with Data Visualization

12 of 47

Example of Queries performed:

  • List of all launch sites
  • Total payload mass of boosters launched by NASA
  • Date of first successful ground pad landing
  • List of all booster versions that have carried the maximum payload mass
  • Rank the landing outcomes between June 2010 and March 2017 from most to least common

  • These queries provide context to the dataset and SpaceX’s launches
  • Notebook Link (github.com)

12

EDA with SQL

13 of 47

Folium Map Objects Used:

  • Marker
    • Clearly marks points of interest on the map
  • MarkerCluster
    • Allows closely-clustered launch outcomes to be seen clearly
  • Polyline (Distance Line)
    • Shows calculated distance and straight-line path between two points
  • Notebook Link (github.com)

13

Build an Interactive Map with Folium

(closed)

(open)

14 of 47

Pie Chart:

  • Visualizes success rates for all sites or for a specific site
  • Customizable via the launch site dropdown

Scatter Plot:

  • Visualizes successes and failures by payload mass and booster version
  • Customizable via the dropdown and the payload range slider

Dashboard Python Code (github.com)

14

Build a Dashboard with Plotly Dash

15 of 47

  1. Data split into training and test subsets (train_test_split)
  2. Models chosen (logistic regression, support vector machine, decision tree, k-nearest neighbors)
  3. Grid search applied to each model (finds best hyperparameters for accuracy)
  4. Optimized models fit with training data and tested with test data
  5. Models evaluated using test accuracy and confusion matrices created

  • Notebook Link (github.com)

15

Predictive Analysis (Classification)

16 of 47

16

  • Exploratory data analysis results:
    • SpaceX became more successful over time
    • AND more adventurous (orbit variety, increased payload mass, etc…)
  • Interactive analytics demo:

  • Predictive analysis results
    • All four models obtained the exact same test accuracy (83.33%)

Summary of Results

17 of 47

Section 2

18 of 47

  • Large flight numbers (which indicate more recent flights) have a high success rate and often take off from Florida (CCAFS or KSC)

18

Flight Number vs. Launch Site

19 of 47

  • VAFB has a maximum payload mass under 10000 kg
  • Higher payload masses have a higher success rate

19

Payload vs. Launch Site

20 of 47

20

Success Rate vs. Orbit Type

  • ES-L1, GEO, HEO, and SSO orbit types have the highest success rate
  • SO is the orbit type with the lowest success rate

21 of 47

21

Flight Number vs. Orbit Type

  • As flight number increases, the variety of orbit types attempted increase.

22 of 47

  • Most of the orbit types are used for specific payload masses (SSO for small masses, VLEO for large masses, etc…).

22

Payload vs. Orbit Type

23 of 47

  • Success rate increases over time.

23

Launch Success Yearly Trend

24 of 47

24

All Launch Site Names

  • Names of the launch sites:

25 of 47

  • The first 5 records where launch sites begin with `CCA`:

25

Launch Site Names Begin with 'CCA'

26 of 47

  • The total payload carried by boosters from NASA:

26

Total Payload Mass

27 of 47

  • The average payload mass carried by booster version F9 v1.1:

27

Average Payload Mass by F9 v1.1

28 of 47

  • The first successful landing outcome on ground pad:

28

First Successful Ground Landing Date

29 of 47

  • Boosters which have successfully landed on a drone ship and had a payload mass greater than 4000 kg but less than 6000 kg:

29

Successful Drone Ship Landing with Payload between 4000 and 6000

30 of 47

  • Count of successful and failed mission outcomes:

30

Total Number of Successful and Failure Mission Outcomes

31 of 47

  • Boosters that have carried the maximum payload mass:

31

Boosters Carried Maximum Payload

32 of 47

  • 2015 failed drone ship landing outcomes:

32

2015 Launch Records

33 of 47

Landing outcomes ranking between 2010-06-04 and 2017-03-20:

33

Rank Landing Outcomes Between 2010-06-04 and 2017-03-20

  • Note: the most common landing outcome is “No attempt”

34 of 47

Section 3

35 of 47

35

Folium Launch Sites Map

  • The four SpaceX launch sites are shown here on this map.
  • Three of the four sites are next to each other along the Florida coastline.

36 of 47

36

Folium VAFB Clusters Map

  • This map shows the cluster of launch outcomes for VAFB LC SLC 4E.
  • The cluster makes it easy to see that 4 out of 10 VAFB landings have been successful.

37 of 47

37

Folium Distance Line Map

  • This map shows the closest straight-line distance between CCAFS SLC-40 and the coastline.

38 of 47

Section 4

39 of 47

39

Total Successful Launches by Launch Site

  • KSC LC-39A had the highest percent of successful launches.
  • Meanwhile, CCAFS SLC-40 had the lowest percent of successful launches.

40 of 47

40

Launch Site with the Highest Success Rate

  • In addition to having the most successful launches, KSC LC-39A had the highest success rate among launch sites.

41 of 47

41

Payload Mass vs. Success (0-7000 kg)

  • Booster versions FT, B4, and B5 (the 3 newest booster versions) make up most of the successes.
  • In turn, v1.0 and v1.1 (the 2 oldest booster versions) make up many of the failures.

42 of 47

Section 5

43 of 47

43

Classification Accuracy

  • All four models have identical testing accuracy.

44 of 47

  • All four models generate the same confusion matrix.
  • Important to note: the models have a 20% false positive rate vs. a 0% false negative rate.

44

Confusion Matrix

45 of 47

  • SpaceX’s success came gradually, with the most common launch outcome being to abort
  • SpaceX has increased mission complexity over time, especially when it comes to orbit types and payload masses
  • 3 of SpaceX’s 4 launch sites (including their most successful site) are near each other and the Florida coastline
  • KSC LC-39A and the more recent SpaceX booster designs are the most successful landing site and booster versions respectively
  • All the classification models created are equally proficient in terms of testing accuracy

45

Conclusions

46 of 47

  • GitHub Repository
  • Datasets: SpaceX and Wikipedia

46

Appendix

47 of 47