I worked on the Google Data Analytics Capstone Project, Track 1, Case Study 1. I will be diving into the background, my full process of cleaning, analyzing and visualizing the data, along with my final suggestions and summary of the data.
Quick Links:
Tableau Dashboard | Github R Code for Analysis | Github R Code for Tableau Visualization | LinkedIn Post
Below is a table of contents in case you want to go to a specific section.
Table of Contents:
BACKGROUND
Cyclistic is a bike sharing program which features more than 5,800 bikes and 600 docking stations. It offers reclining bikes, hand tricycles, and cargo bikes, making it more inclusive to people with disabilities and riders who can't use a standard two-wheeled bike. It was founded in 2016 and has grown tremendously into a fleet of bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Previously, Cyclistic's marketing strategy tried to build the general awareness and appeal to broad consumers. It has flexible pricing plans: single-ride passes, full-day passes, and annual memberships. Those who purchase single-ride or full-day passes are referred to as casual riders while those who purchase annual memberships are Cyclistic members.
My Role: In this scenario I am a junior data analyst at Cyclistic and my team has been tasked with the overall goal (see below) of designing marketing strategies
Overall Goal: Design marketing strategies aimed at converting casual riders into annual members.
Business Question: "How do annual members and casual riders use Cyclistic bikes differently?"
Below I will describe step-by-step the process I used to for this project. If you want to skip ahead to the business suggestions move onto the section "Insights".
PROCESS:
Overview: I first analyzed the data separately (each month) in Excel, then used R to analyze the data as a whole (one year). Finally I created a dashboard in Tableau and used Figma to support the design elements.
Microsoft Excel
I initially wanted to gather and analyze my data in Excel because it was the tool I was most familiar with and I could get a general understanding of the data quicker. I did not combine all of the spreadsheets into one because that would've taken more processing power than my computer had.
I began downloading the data from divvy-tripdata, and turning the .csv files into excel spreadsheets. I downloaded the most recent year of data which was at the time of starting my project:
August 2020
September 2020
October 2020
November 2020
December 2020
January 2021
February 2021
March 2021
April 2021
May 2021
June 2021
July 2021
Added two columns to all of the months:
ride_length calculated the total ride length for each trip using the start_at column which was: ending time minus starting time.
day_of_week calculated the day of the week for each trip using the start_at column date.
Went over the business task and the information I had at hand and how that could be used to figure out how members and casual riders use the bike service differently
Came up with metrics to look at such as :
total number of rides per hour, per day of the month, per season, per day of the week, and for different bike types
Average ride length between members and casual
For every month in Excel created pivot tables and charts to go with the analysis on (this took the longest):
Total Rides per Weekday - calculated the total rides for members and casual and separated it by day of the week; used a cluster column chart
Average Ride Length - calculated the average ride length for members and casual and separated it by day of the week; used a cluster column chart
Total Rides per Hour - calculated the total rides for members and casual separated by the time of the day (24hr); used a line comparison chart
Total Rides per Day - calculated the total rides for members and casual separated by the day of the month; used a line comparison chart
Total Rides per Bike Type - calculated the total rides for members and casual separated by Bike type; used stacked column chart
I also created a Google docs Notes list where I wrote down the exact steps for each month (had a checklist) and included my insights for each month
Time Spent:
535 minutes or just under 9 hours to complete.
R
I originally wanted to use SQL but the files were too big to upload and I couldn't figure out how to utilize Google Cloud Platform. Instead I used R to analyze the data because it could handle all of the information quicker than Excel, and I wanted to work on my R skills. Below is my general process in R, I didn't include my mistakes/missteps or errors for the sake of brevity.
View my full code on my Github for this capstone project here.
Load all of the libraries I used: tidyverse, lubridate, hms, data.table
Uploaded all of the original data from the data source divytrip into R using read_csv function to upload all individual csv files and save them in separate data frames. For august 2020 data I saved it into aug08_df, september 2020 to sep09_df and so on.
Merged the 12 months of data together using rbind to create a one year view
Created a new data frame called cyclistic_date that would contain all of my new columns
Created new columns for:
Ride Length - did this by subtracting end_at time from start_at time
Day of the Week
Month
Day
Year
Time - convert the time to HH:MM:SS format
Hour
Season - Spring, Summer, Winter or Fall
Time of Day - Night, Morning, Afternoon or Evening
Cleaned the data by:
Removing duplicate rows
Remove rows with NA values (blank rows)
Remove where ride_length is 0 or negative (ride_length should be a positive number)
Remove unnecessary columns: ride_id, start_station_id, end_station_id, start_lat, start_long, end_lat, end_lng
Calculated Total Rides for:
Total number of rides which was just the row count = 4,152,139
Member type - casual riders vs. annual members
Type of Bike - classic vs docked vs electric; separated by member type and total rides for each bike type
Hour - separated by member type and total rides for each hour in a day
Time of Day - separated by member type and total rides for each time of day (morning, afternoon, evening, night)
Day of the Week - separated by member type and total rides for each day of the week
Day of the Month - separated by member type and total rides for each day of the month
Month - separated by member type and total rides for each month
Season - separated by member type and total rides for each season (spring, summer, fall, winter)
Calculated Average Ride Length for:
Total average ride length
Member type - casual riders vs. annual members
Type of Bike - separated by member type and average ride length for each bike type
Hour - separated by member type and average ride length for each hour in a day
Time of Day - separated by member type and average ride length for each time of day (morning, afternoon, evening, night)
Day of the Week - separated by member type and average ride length for each day of the week
Day of the Month - separated by member type and average ride length for each day of the month
Month - separated by member type and average ride length for each month
Season - separated by member type and average ride lengths for each season (spring, summer, fall, winter)
Then using all of this data I created my own summary in my case notes and took note of the: total rides for each variable, average ride lengths for each variable, and the difference between members versus casual riders. I originally wanted to create a report using R Markdown as well but for the sake of time (I had already spent over 20 hours on the project so far), I decided to skip this step, and write this article instead.
Time Spent:
1045 minutes or about 17 and a half hours to complete.
Tableau
While I learned the basics of Tableau in the Google Course I wanted more practice with visualizing data and creating dashboards.
To view my completed dashboard click here.
I created a separate R code (you can view it here on Github) that made some changes for specifically the Tableau portion.
For ride length I rounded the digits by 1, meaning my numbers were 29.8 or 12.5.
Revised how I created my "month" column. I used mutate() to create a column that had the month in ___ format and not number format. So instead of 01 it would say "January"
Cleaned the data: removed rows with NA values, removed duplicate rows, removed where ride_length was 0 or negative and removed unnecessary columns like: ride_id, start_station_id, end_station_id, start_lat, start_long, end_lat, end_lng
Created a new dataframe with this information so I could test the difference between the original data frame (cyclistic_date) that I used for my analysis and the data frame I would use for Tableau (cyclistic_tableau).
In this new data frame I removed more columns to make calculations quicker in Tableau. I removed: start_station_name, end_station_name, time, started_at, ended_at
Downloaded this data frame into a .csv file which I uploaded to Tableau
Created graphs similar to those I created in Excel but added a few:
User Type
Total Rides by Bike Type
Ride Length by Weekday
Total Rides by Weekday
Total Rides by Hour
Total Rides by Month
Then I created a basic dashboard with all of that information, a prototype for me to view while I was creating the final dashboard (Figure 1 below).
Created a prototype mockup in Figma
Created a final version of the mockup in Figma
Edited Dashboard in Tableau to reflect design in Figma
Edited graphs in Tableau
Made bar graphs round
Added annotations
Highlights to specific important notes
Got rid of labels for visual purposes
Combined Figma and Tableau (used dashboard created in Figma as the background for my Tableau Dashboard) to create a final prototype (Figure 2 below)
Made minor edits to design elements and created final dashboard (Figure 3 - Cyclistic Dashboard V1)
On April 24, 2023 I decided to update my dashboard (See Finished Project, image Final Dashboard - Cyclistic Dashboard V2). All of the analysis is the same. The only changes have been to the dashboard. Which include:
Adding horizontal grid lines to a few of the charts
Updating the tool tips.
Making all of the top metric values (e.g. Total Rides, Average Ride Length, etc.) interactive in Tableau instead of in Figma.
Time Spent:
765 minutes or almost 13 hours to complete.
Tableau Prototype
Below was my first draft of the dashboard only using Tableau.
Prototype using Figma Background
Combined Figma and Tableau (used dashboard created in Figma as the background for my Tableau Dashboard) to create a final prototype.
Final Dashboard V1
Made minor edits to design elements and created final dashboard. This was the original final dashboard.
Misc.
I am including the other tools I used.
Figma to create my background and help develop the dashboard aesthetics.
Google Docs helped me keep track of all of my documents for this project like:
Date Log - I wrote down what I did that day related to my project
Resources - A list of resources I frequently used
Case Notes - Notes for the case study including the final insights, what I was looking for, and anything else having to do with the case
Evernote to draft this article before I uploaded it here.
FINISHED PROJECT
Here is my finished project: Google Capstone Project (V2). You can view the links to my R code on Github used for analysis here and the code for Tableau here.
Note: This is V2 with a few minor changes to the dashboard. Including:
Adding horizontal grid lines to a few of the charts
Updating the tool tips.
Making all of the top metric values (e.g. Total Rides, Average Ride Length, etc.) interactive in Tableau instead of in Figma.
SUMMARY OF DATA
Those who purchase single-ride or full-day passes are referred to as casual riders while those who purchase annual memberships are Cyclistic members.
Data:
Total Rides by User Type
Members had more rides with 2,328,763 total rides or 56% and casual riders had 1,823,376 total rides or 43%.
Total Rides per Bike Type
Both casual riders and members used the classic bike the most with 1,777,593 rides or 43% of total rides, followed by docked bikes with 1,545,936 rides or 37% of total rides, and lastly with electric bikes at 828,610 rides or 20% of total rides.
Average Ride Length by User Type
The total average ride length was 24 minutes. For casual riders it was longer at 27 minutes while members was 14 minutes.
Average Ride Length per Weekday
For the average ride length per weekday both casual riders and members had an increase in the average ride length on the weekends. For both Sunday was the longest at 31 minutes.
Total Rides by Weekday
Saturday was the most popular weekday combining casual riders and member rides with 784,239 rides or 19% of total rides. But for member rides only Wednesday was the most popular day with 356,060 rides, 5,407 rides more than Saturday.
Total Rides by Hour
5PM or 17:00 was the busiest hour for both members and casual riders with 426,685 rides or 10% of the total rides. Typically rides began increasing in the morning at 6AM and rose until 5PM then dropped afterwards. The afternoon was the busiest for both rider types with 1,905,797 rides or 45% of total rides. 4AM was the least popular hour.
Total Rides by Month
July was the busiest month combining casual riders and member rides at 691,476 rides or 16% of total rides. While summer was the most popular season for both at 1,903,446 rides or 46% of total rides. Looking at just members August is actually the busiest month with 323,140 rides, 816 rides more than July. Winter is the least popular season and February is the least popular month.
Final Summary
The most popular bike among with riders was the classic.
Busiest time was afternoon and the peak time was at 5PM for both casual riders and members.
Busiest weekday was Saturday, casual riders used the service the most on the weekends.
Busiest season was Summer for both types of riders.
Most rides by User Type was members but casual riders weren't far behind.
The average ride length was 24 minutes but casual riders on average rode 23 minutes longer than members.
BUSINESS SUGGESTIONS
This was the hardest part for me for the whole project. I have never provided suggestions for a business nor worked in marketing. Any feedback here would be appreciated.
These are my suggestions for the marketing team to convert casual riders to annual members:
Personalize discounts and show perks in the membership program based on their preferences and riding habits.
Emphasize the benefits of memberships, including discounts during busy times of the year like during Summer, or on the weekends.
Have existing members to share their stories about how using Cyclistic's system has changed their life, to create a sense of community, offer a discount if they do so this will help encourage new riders to join the program.
WHAT I LEARNED
Below is what I learned/practiced from over 40 hours spent on this project:
Pivot Tables in Microsoft Excel
Practice using R for data analysis and cleaning specifically using the tidyverse package for data analysis
Graphs in Tableau, edited visual elements along with creating different charts and filters.
Design elements of an effective dashboard
Combining the design feature of Figma with the functionality of Tableau
RESOURCES
R portion of my project I found Itamar's case study on Kaggle using R as well, a helpful resource.
Tableau portion I used Navneet Singh's Tableau Dashboard as inspiration.