Google Data Analytics Capstone Project
Tools used: Excel, Sheets, R and Python, SQL
ASK
Problem Statement: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Cycling company called Cyclistic is interested in analyzing the Cyclistic historical bike trip data to identify trends.
Questions will guide the future marketing program:
�
Questions that don’t have anything to do with data is answered here
All other questions regarding data is inquired
Click on the icons to navigate slides
PROCESS
Data is cleaned and put into Excel / Spreadsheets after being filtered using SQL
SELECT * �FROM BikingData�WHERE member_casual = "member" OR member_casual = "casual" |
ride_id | rideable_type | started_at | ended_at | start_station_name | start_station_id | end_station_name | end_station_id | start_lat | start_lng | end_lat | end_lng | member_casual |
EACB19130B0CDA4A | docked_bike | 2020-01-21 20:06:59 | 2020-01-21 20:14:30 | Western Ave & Leland Ave | 239 | Clark St & Leland Ave | 326 | 41.9665 | -87.6884 | 41.9671 | -87.6674 | member |
- Sample Google Sheets with all the column labels (linked to the actual spreadsheet). Most important columns highlighted
Click on the icons to navigate slides
ANALYZE
Processing on google sheets -- using cell formulas, the date was removed and the times were extracted in a different cell in order to obtain the time it took for riders. Data also reduced significantly down to 400 riders.
Answers to the Questions based on Analysis:
�
Data is further filtered through weeks and is narrowed down to a good sample size from the overall population of 430,000 riders.
Click on the icons to navigate slides
SHARE
Data is shared and visualized through R and Spreadsheets
Click on the icons to navigate slides
In order to show the differences between each rider, R (programming language) and some Python was used to display the difference in times between casual riders vs member riders. The average member riding time in seconds was about 599 seconds, which is about 10 minutes as compared to members who rode on average 2082 seconds which is about 35 minutes. Spreadsheets and excels were then used to share a frequency plot to give an idea of all the scatter start times and end times
�
ACT
Click on the icons to navigate slides
What can cyclistic do with this information ?
Spreadsheet Links
Github Link for the code
�
Data is acted upon now and research will be taken further