1 of 6

Google Data Analytics Capstone Project

Tools used: Excel, Sheets, R and Python, SQL

2 of 6

ASK

Problem Statement: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Cycling company called Cyclistic is interested in analyzing the Cyclistic historical bike trip data to identify trends.

Questions will guide the future marketing program:

  • How do annual members and casual riders use Cyclistic bikes differently?
    • Consider distance travelled which would be estimated using time rather than latitude
  • Why would casual riders buy Cyclistic annual memberships?
    • Tourism, more distance perhaps, attractions and sighseeing
  • How digital media could affect their marketing tactics?

Questions that don’t have anything to do with data is answered here

All other questions regarding data is inquired

Click on the icons to navigate slides

3 of 6

PROCESS

Data is cleaned and put into Excel / Spreadsheets after being filtered using SQL

  • SQL is first used to clean the data and remove any data that are not part of member riders or casual riders -
  • After filtering the data, the data is then put into spreadsheets where they are analyzed more closely (the dataset contains information on roughly 430,000 cyclists who will be filtered down even further in spreadsheets.)

  • This SQL code is all that is needed to filter through 430,000 cyclists. This code SELECTS all the data FROM the BikingData table WHERE the member_casual column is either a member or a casual

SELECT * �FROM BikingData�WHERE member_casual = "member" OR member_casual = "casual"

ride_id

rideable_type

started_at

ended_at

start_station_name

start_station_id

end_station_name

end_station_id

start_lat

start_lng

end_lat

end_lng

member_casual

EACB19130B0CDA4A

docked_bike

2020-01-21 20:06:59

2020-01-21 20:14:30

Western Ave & Leland Ave

239

Clark St & Leland Ave

326

41.9665

-87.6884

41.9671

-87.6674

member

- Sample Google Sheets with all the column labels (linked to the actual spreadsheet). Most important columns highlighted

Click on the icons to navigate slides

4 of 6

ANALYZE

Processing on google sheets -- using cell formulas, the date was removed and the times were extracted in a different cell in order to obtain the time it took for riders. Data also reduced significantly down to 400 riders.

Answers to the Questions based on Analysis:

  • Q1: How do annual members and casual riders use Cyclistic bikes differently?
  • Answer: The annual members and casual riders times were measured using the excel formula (=time(hour(col), minute(col), sec(col)) to remove all the dates and then formatted to 24 hours to get the difference.
  • Difference showed us that members are far more likely to ride the bicycles for much shorter than casual bikers who ride longer.
  • Casuals go farther in distance, spend more time on the bikes and it seems that they are usually tourists around the city..
  • Q2: Why would casual riders buy Cyclistic annual memberships?
  • Answer: Casual riders would buy annual memberships if they want to be serious riders and moving into the city
  • In terms of the data provided, any rider that rides for over 10 minutes is considered a serious rider
  • An analytical assumption to be made based on data about casual riders with times over an hour is that they will most likely opt to be members because they visibly want to explore the city or they are just tourists with no intentions of moving in.
  • Q3: How digital media could affect their marketing tactics.
  • Answer: The promotion of services through members who are already riding a lot
  • Appeal more to casual riders since those who ride for longer are usually more likely to be tourists

Link to Filtered Spreadsheet

Data is further filtered through weeks and is narrowed down to a good sample size from the overall population of 430,000 riders.

Click on the icons to navigate slides

5 of 6

SHARE

Data is shared and visualized through R and Spreadsheets

Click on the icons to navigate slides

In order to show the differences between each rider, R (programming language) and some Python was used to display the difference in times between casual riders vs member riders. The average member riding time in seconds was about 599 seconds, which is about 10 minutes as compared to members who rode on average 2082 seconds which is about 35 minutes. Spreadsheets and excels were then used to share a frequency plot to give an idea of all the scatter start times and end times

6 of 6

ACT

Click on the icons to navigate slides

What can cyclistic do with this information ?

  • Present further information on Tableau and make interactive dashboards showing unfiltered data
  • Use the data gathered and try to cater to more casual riders
    • casual riders use bikes the longest -- why?
  • Gather more information on most commonly used biking stations
  • Use advertising and marketing to cater

Spreadsheet Links

Github Link for the code

  • Link to github

Data is acted upon now and research will be taken further