1 of 48

Sonya Lawrence

07/31/2022

2 of 48

2

  • Executive Summary
  • Introduction
  • Methodology
  • Results
  • Conclusion
  • Appendix

Outline

3 of 48

3

  • Summary of methodologies
    • Data Collection using an API
    • Data Collection through Web Scraping
    • Data Wrangling
    • Exploratory Data Analysis Using SQL
    • Exploratory Data Analysis using Data Visualization
    • Interactive Visual Analytics using Folium
    • Machine Learning Model Building and Predictions
  • Summary of all results
    • Exploratory data analysis results presented using visual aids
    • The optimal machine learning model was acquired
    • Interesting insights gained from data

Executive Summary

4 of 48

4

Introduction

Project background and context

SpaceX advertises Falcon 9 rocket launches on its website, with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage. Therefore, if we can determine if the first stage will land, we can determine the cost of a launch. This information can be used if an alternate company wants to bid against SpaceX for a rocket launch. 

Problem questions:

  1. What factors determine if the rocket will land successfully?
  2. What factors determine the likelihood that SpaceX will reuse the first stage of the rocket?

5 of 48

5

Section 1

6 of 48

6

  • Data collection methodology:
    • SpaceX Rest API and Web scraping from Wikipedia
  • Data wrangling Methodology:
    • Cleaned data accounting for missing values
    • One-hot encoding applied to categorical data for transformation
  • Exploratory data analysis (EDA) methodology:
    • Python data visualization and SQL queries
    • Interactive visual analytics using Folium and Plotly Dash
  • Predictive analysis using classification models methodology:
    • Build, tune, and evaluate classification models

Methodology

7 of 48

7

Data Collection

    • Fetched API Records:
      • Get Payload Data
      • Get Booster Version
      • Get Launch Site
      • Get Core Data

SpaceX API

    • Normalized .json file
    • Parsed through file with Pandas DataFrame
    • Filtered for Falcon 9 data
    • Filled missing Payload Mass data with mean value

Organized Data

    • Saved cleaned data to a .csv file

Saved File

    • Collected Falcon 9 launch records using requests.get('url')
    • Defined functions to extract specific HTML data

Wikipedia

    • Parsed through HTML data using BeautifulSoup()
    • Cleaned and structured the data into a Pandas DataFrame

Organized Data

    • Saved cleaned data to a .csv file

Saved File

8 of 48

  • Get requests used to extract data from the SpaceX API
  • Github Repository

8

Data Collection – SpaceX API

9 of 48

  • A Python BeautifulSoup object was created from the SpaceX Falcon9 data
  • Guthub Repository

9

Data Collection - Web Scraping

10 of 48

  • Sometimes a landing was attempted but failed
          • True Ocean means mission outcome was successfully landed in the ocean
  • False Ocean means mission outcome was unsuccessfully landed in the ocean.

  • Converted those outcomes into Training Labels with:

`1` meaning it was successful

`0` meaning it was unsuccessful.

  • GitHub Repository

10

Data Wrangling

Perform exploratory data analysis and determined training labels

Calculate the number of launches at each site

Calculate the number of occurrences of each orbit type

Create a landing outcome label called Case from the Outcomes column

Exported the cleaned data to a .csv file

11 of 48

  • Charts plotted:
    • Flight number vs. Payload mass
    • Flight number vs. Launch site
    • Payload mass vs. Launch site
    • Success rate of each original
    • Flight number vs. Orbit type
    • Payload mass vs. Orbit type
    • Average launch success trend
  • Chart types used:
    • Scatter plots show the relationship between variables
    • Bar charts show comparison among discrete categories
    • Line graphs show trends in data over time
  • GitHub Repository

11

EDA with Data Visualization

12 of 48

  • Exploratory data analysis:
    • Unique launch sites and Launch sites beginning with ‘KSC’
    • Total payload mass for NASA (CRS)
    • Average payload mass by f9 v1.1
    • Date of first successful drone ship landing
    • Boosters with payload mass between 4000 and 6000
    • Total number of successful and failed missions
    • Booster versions carrying the max payload
    • Successful landing outcomes by unique categories
  •  GitHub Repository

12

EDA with SQL in Jupyter Notebook

13 of 48

  • All launch sites were marked using circles and labeled
  • Failed launch sites were colored red and successful launch sites were colored green
  • The launch sites having relatively high success rates were identified
  • The distance from several landmarks to the launch site was calculated and marked using lines
  •  GitHub Repository

13

Folium Interactive Map

14 of 48

  • Launch site drop-down list:
    • Added a drop-down list to enable launch site selection
  • Pie chart showing successful launches:
    • Added an interactive pie chart giving users the ability to view the count of successful and failed launches from each site
  • Slider of payload mass range
    • Added a slider to filter payload mass range
  • Scatter plot of payload mass and success rate per booster version:
    • Added a scatter plot showing the correlation between payload mass and anchor success
  •  GitHub Repository

14

Plotly Dash Dashboard

15 of 48

15

Predictive Analysis (Classification)

Imported .csv data file into a Pandas dataframe

Created a Numpy array containing the 'Class' column

Standardized, fitted and transformed the data using StandardScalar

Split the data in training and testing sets using the train_test_split function

Created a GridSearchCV object with cv=10 to find the best parameters

Applied GridSearchCV to Logistic Regression, Support Vector Machine, Decision Tree, K-Nearest Neighbor

Calculated the accuracy of the test data using the .score() method for each model

Created and assessed the confusion matrix for each model

Found the model with the best parameters by comparing the model accuracy score

16 of 48

16

  • Exploratory data analysis results
  • Interactive analytics demo in screenshots
  • Predictive analysis results

Results

17 of 48

Section 2

18 of 48

  • The chart below shows that as the flight number increases, the first stage is more likely to land successfully.
  • CCAFS LC-40 has a 60% success rate.
  • Both KSC LC-39A and VAFB SLC 4E have a 77% success rate.

18

Flight Number vs. Launch Site

19 of 48

  • The below chart shows that the higher the Payload mass, the higher the success rate for each launch site.
  • For VAFB-SLC 4E, there are no rockets launched for payload mass over 10,000kg.
  • KSC LC 39A has a 100% success rate for Payload mass under 5,500kg.
  • More than half the flights were from the CCAFS SLC 40 launch site.

19

Payload vs. Launch Site

20 of 48

  • Orbits with 100% success rate:
    • ES-L1, GEO, HEO, SSO
  • Orbits with 0% success rate:
    • SO
  • Orbits with a 50% siccess rate:
    • GTO

20

Success Rate vs. Orbit Type

21 of 48

  • The chart below shows that:
    • LEO orbit’s success rate is directly related to the flight number
    • there is no relation between GTO orbit and flight number

21

Flight Number vs. Orbit Type

22 of 48

22

MOST SUCCESSFUL Orbit: LEO, ISS, and PO

23 of 48

23

Launch Success Yearly Trend Upwards since 2013

24 of 48

  • Display the names of the unique launch sites in the space mission.
  • The DISTINCT() key word was used to show the list of unique launch sites.

24

All Launch Site Names

25 of 48

  • Find 5 records where launch sites’ names start with `KSC`
  • The WHERE() clause was used to find specific launch site names

25

Launch Site Names Begin with 'KSC'

26 of 48

  • Calculate the total payload carried by boosters from NASA
  • The SUM() function and WHERE() clause were used to calculate this value

26

Total Payload Mass

27 of 48

  • Calculate the average payload mass carried by booster version F9 v1.1
  • The AVG() function and WHERE() clause were used to calculate this value

27

Average Payload Mass by F9 v1.1

28 of 48

  • Find the dates of the first successful landing outcome on drone ship.
  • The LIKE operator was used to filter for this unique criterion

28

First Successful Ground Landing Date

29 of 48

  • List the names of boosters which have successfully landed on drone ship and had payload mass greater than 4000 but less than 6000
  • The LIKE and AND operators were used to filter for these unique criteria

29

Successful Drone Ship Landing with Payload between 4000 and 6000

30 of 48

  • Calculate the total number of successful and failure mission outcomes
  • A JOIN clause was used to combine the rows of the nested SQL query results

30

Total Number of Successful and Failure Mission Outcomes

31 of 48

  • List the names of the booster which have carried the maximum payload mass
  • A subquery was used in the WHERE() clause to filter for the specific criteria

31

Boosters Carrying Maximum Payload

32 of 48

  • List the records for the months in year 2017
  • A WHERE() clause containing LIKE and AND operators was used to parse through the database to solve this problem

32

2017 Launch Records

33 of 48

  • Rank the count of successful landing_outcomes between the date 2010-06-04 and 2017-03-20 in descending order
  • The WHERE(), GROUP BY() and ORDER BY() clauses were used to sort the results of this query

33

Rank Landing Outcomes Between 2010-06-04 and 2017-03-20

34 of 48

Section 3

35 of 48

SpaceX launch sites are located in the states of Florida and California in the Unites States of America.

35

Launch Site Locations – Global Map

36 of 48

36

Success Rates at Launch Sites

37 of 48

37

Distance from CCAFS SLC-40 to Landmarks

The distance between CCAFS SLC-40 and:

  • the Coastline is 0.90 KM 
  • the Railway is 78.62 KM 
  • the Highway is 29.21 KM 
  • a major city(Orlando) is 78.45 KM

38 of 48

Section 4

39 of 48

  • The above pie chart shows that KSC LC 39A is the most successful launch site

39

Launch Success Count for all Sites

40 of 48

  • KSC LC 39A has a launch success rate of 76.9% which is the highest of all the launch sites.

40

Most Successful Launch Site

41 of 48

  • High Payload Mass
    • Above 5000kg

41

Low Payload Mass = High Success Rate

  • Low Payload Mass
    • Below 4000kg

42 of 48

Section 5

43 of 48

The most accurate classification algorithm for predicting a successful landing, from the models tested, is the Decision Tree Classifier (DTC) with an accuracy score of 0.889 and the following parameters:

  • criterion: gini
  • max depth: 6
  • max features: sqrt
  • min samples leaf: 1
  • min samples split: 2
  • splitter: best

43

Supervised Machine Learning Classification Accuracy

44 of 48

The confusion matrix for the DTC shows that the classifier can distinguish between the different landing classes.

The major problem is the false positives .i.e., unsuccessful landings marked as successful by the classifier.

44

Confusion Matrix for DTC

45 of 48

46 of 48

  • The finding from this assessment indicate:
    • The larger the flight amount at a launch site, the greater the success rate
    • Launches with a low payload mass have a higher success rate
    • The success rate of launches increases over time
    • Orbits ES-L1, GEO, HEO, SSO, VLEO had the highest success rate
    • Launch site KSC LC-39A had the most successful launches
    • The Decision tree classifier (DTC) is the best machine learning algorithm for predicting landing success

  • Cautions:
    • When using the DTC be cautious of unsuccessful launches being classified as successful

  • Recommendations:
    • The competing startup should model their rockets after the KSC LC-39A with relatively low payload mass using orbit types like ES-L1, GEO, HEO, SSO or VLEO.
    • All else being equal, this should increase the likelihood of the first stage landing successfully and increase competition for SpaceX, eventually lowering the cost of space exploration.

46

Conclusions

47 of 48

  • Special thanks to:
    • My family; for their continued support
    • The EdX Instructors; for designing this course and project
    • My awesome peers; for being an available reference network
    • Last but not least, Myself; for never quitting in the face of adversity 🥳

47

Appendix

48 of 48