top of page
Kelly Adams

Google Data Analytics Capstone Project

Updated: Jul 5, 2023

I worked on the Google Data Analytics Capstone Project, Track 1, Case Study 1. I will be diving into the background, my full process of cleaning, analyzing and visualizing the data, along with my final suggestions and summary of the data.


Quick Links:


Below is a table of contents in case you want to go to a specific section.

Table of Contents:

 

BACKGROUND

Cyclistic is a bike sharing program which features more than 5,800 bikes and 600 docking stations. It offers reclining bikes, hand tricycles, and cargo bikes, making it more inclusive to people with disabilities and riders who can't use a standard two-wheeled bike. It was founded in 2016 and has grown tremendously into a fleet of bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.


Previously, Cyclistic's marketing strategy tried to build the general awareness and appeal to broad consumers. It has flexible pricing plans: single-ride passes, full-day passes, and annual memberships. Those who purchase single-ride or full-day passes are referred to as casual riders while those who purchase annual memberships are Cyclistic members.

My Role: In this scenario I am a junior data analyst at Cyclistic and my team has been tasked with the overall goal (see below) of designing marketing strategies


Overall Goal: Design marketing strategies aimed at converting casual riders into annual members.


Business Question: "How do annual members and casual riders use Cyclistic bikes differently?"


Below I will describe step-by-step the process I used to for this project. If you want to skip ahead to the business suggestions move onto the section "Insights".


 

PROCESS:

Overview: I first analyzed the data separately (each month) in Excel, then used R to analyze the data as a whole (one year). Finally I created a dashboard in Tableau and used Figma to support the design elements.


Microsoft Excel

I initially wanted to gather and analyze my data in Excel because it was the tool I was most familiar with and I could get a general understanding of the data quicker. I did not combine all of the spreadsheets into one because that would've taken more processing power than my computer had.

  1. I began downloading the data from divvy-tripdata, and turning the .csv files into excel spreadsheets. I downloaded the most recent year of data which was at the time of starting my project:

    1. August 2020

    2. September 2020

    3. October 2020

    4. November 2020

    5. December 2020

    6. January 2021

    7. February 2021

    8. March 2021

    9. April 2021

    10. May 2021

    11. June 2021

    12. July 2021

  2. Added two columns to all of the months:

    1. ride_length calculated the total ride length for each trip using the start_at column which was: ending time minus starting time.

    2. day_of_week calculated the day of the week for each trip using the start_at column date.

  3. Went over the business task and the information I had at hand and how that could be used to figure out how members and casual riders use the bike service differently

  4. Came up with metrics to look at such as :

    1. total number of rides per hour, per day of the month, per season, per day of the week, and for different bike types

    2. Average ride length between members and casual

  5. For every month in Excel created pivot tables and charts to go with the analysis on (this took the longest):

    1. Total Rides per Weekday - calculated the total rides for members and casual and separated it by day of the week; used a cluster column chart

    2. Average Ride Length - calculated the average ride length for members and casual and separated it by day of the week; used a cluster column chart

    3. Total Rides per Hour - calculated the total rides for members and casual separated by the time of the day (24hr); used a line comparison chart

    4. Total Rides per Day - calculated the total rides for members and casual separated by the day of the month; used a line comparison chart

    5. Total Rides per Bike Type - calculated the total rides for members and casual separated by Bike type; used stacked column chart

  6. I also created a Google docs Notes list where I wrote down the exact steps for each month (had a checklist) and included my insights for each month

Time Spent:

535 minutes or just under 9 hours to complete.


R

I originally wanted to use SQL but the files were too big to upload and I couldn't figure out how to utilize Google Cloud Platform. Instead I used R to analyze the data because it could handle all of the information quicker than Excel, and I wanted to work on my R skills. Below is my general process in R, I didn't include my mistakes/missteps or errors for the sake of brevity.

View my full code on my Github for this capstone project here.

  1. Load all of the libraries I used: tidyverse, lubridate, hms, data.table

  2. Uploaded all of the original data from the data source divytrip into R using read_csv function to upload all individual csv files and save them in separate data frames. For august 2020 data I saved it into aug08_df, september 2020 to sep09_df and so on.

  3. Merged the 12 months of data together using rbind to create a one year view

  4. Created a new data frame called cyclistic_date that would contain all of my new columns

  5. Created new columns for:

    1. Ride Length - did this by subtracting end_at time from start_at time

    2. Day of the Week

    3. Month

    4. Day

    5. Year

    6. Time - convert the time to HH:MM:SS format

    7. Hour

    8. Season - Spring, Summer, Winter or Fall

    9. Time of Day - Night, Morning, Afternoon or Evening

  6. Cleaned the data by:

    1. Removing duplicate rows

    2. Remove rows with NA values (blank rows)

    3. Remove where ride_length is 0 or negative (ride_length should be a positive number)

    4. Remove unnecessary columns: ride_id, start_station_id, end_station_id, start_lat, start_long, end_lat, end_lng

  7. Calculated Total Rides for:

    1. Total number of rides which was just the row count = 4,152,139

    2. Member type - casual riders vs. annual members

    3. Type of Bike - classic vs docked vs electric; separated by member type and total rides for each bike type

    4. Hour - separated by member type and total rides for each hour in a day

    5. Time of Day - separated by member type and total rides for each time of day (morning, afternoon, evening, night)

    6. Day of the Week - separated by member type and total rides for each day of the week

    7. Day of the Month - separated by member type and total rides for each day of the month

    8. Month - separated by member type and total rides for each month

    9. Season - separated by member type and total rides for each season (spring, summer, fall, winter)

  8. Calculated Average Ride Length for:

    1. Total average ride length

    2. Member type - casual riders vs. annual members

    3. Type of Bike - separated by member type and average ride length for each bike type

    4. Hour - separated by member type and average ride length for each hour in a day

    5. Time of Day - separated by member type and average ride length for each time of day (morning, afternoon, evening, night)

    6. Day of the Week - separated by member type and average ride length for each day of the week

    7. Day of the Month - separated by member type and average ride length for each day of the month

    8. Month - separated by member type and average ride length for each month

    9. Season - separated by member type and average ride lengths for each season (spring, summer, fall, winter)

Then using all of this data I created my own summary in my case notes and took note of the: total rides for each variable, average ride lengths for each variable, and the difference between members versus casual riders. I originally wanted to create a report using R Markdown as well but for the sake of time (I had already spent over 20 hours on the project so far), I decided to skip this step, and write this article instead.

Time Spent:

1045 minutes or about 17 and a half hours to complete.


Tableau

While I learned the basics of Tableau in the Google Course I wanted more practice with visualizing data and creating dashboards.

To view my completed dashboard click here.

  1. I created a separate R code (you can view it here on Github) that made some changes for specifically the Tableau portion.

    1. For ride length I rounded the digits by 1, meaning my numbers were 29.8 or 12.5.

    2. Revised how I created my "month" column. I used mutate() to create a column that had the month in ___ format and not number format. So instead of 01 it would say "January"

    3. Cleaned the data: removed rows with NA values, removed duplicate rows, removed where ride_length was 0 or negative and removed unnecessary columns like: ride_id, start_station_id, end_station_id, start_lat, start_long, end_lat, end_lng

    4. Created a new dataframe with this information so I could test the difference between the original data frame (cyclistic_date) that I used for my analysis and the data frame I would use for Tableau (cyclistic_tableau).

    5. In this new data frame I removed more columns to make calculations quicker in Tableau. I removed: start_station_name, end_station_name, time, started_at, ended_at

    6. Downloaded this data frame into a .csv file which I uploaded to Tableau

  2. Created graphs similar to those I created in Excel but added a few:

    1. User Type

    2. Total Rides by Bike Type

    3. Ride Length by Weekday

    4. Total Rides by Weekday

    5. Total Rides by Hour

    6. Total Rides by Month

  3. Then I created a basic dashboard with all of that information, a prototype for me to view while I was creating the final dashboard (Figure 1 below).

  4. Created a prototype mockup in Figma

  5. Created a final version of the mockup in Figma

  6. Edited Dashboard in Tableau to reflect design in Figma

  7. Edited graphs in Tableau

    1. Made bar graphs round

    2. Added annotations

    3. Highlights to specific important notes

    4. Got rid of labels for visual purposes

  8. Combined Figma and Tableau (used dashboard created in Figma as the background for my Tableau Dashboard) to create a final prototype (Figure 2 below)

  9. Made minor edits to design elements and created final dashboard (Figure 3 - Cyclistic Dashboard V1)

  10. On April 24, 2023 I decided to update my dashboard (See Finished Project, image Final Dashboard - Cyclistic Dashboard V2). All of the analysis is the same. The only changes have been to the dashboard. Which include:

    1. Adding horizontal grid lines to a few of the charts

    2. Updating the tool tips.

    3. Making all of the top metric values (e.g. Total Rides, Average Ride Length, etc.) interactive in Tableau instead of in Figma.

Time Spent:

765 minutes or almost 13 hours to complete.


Tableau Prototype

Below was my first draft of the dashboard only using Tableau.

Prototype of my dashboard for my google capstone project
Figure 1 - Cyclistic Dashboard Prototype

Prototype using Figma Background

Combined Figma and Tableau (used dashboard created in Figma as the background for my Tableau Dashboard) to create a final prototype.

Dashboard Prototype with Figma background
Figure 2 - Cyclistic Dashboard Prototype with Figma background

Final Dashboard V1

Made minor edits to design elements and created final dashboard. This was the original final dashboard.

Figure 3 - Cyclistic Dashboard V1

Misc.

I am including the other tools I used.

  • Figma to create my background and help develop the dashboard aesthetics.

  • Google Docs helped me keep track of all of my documents for this project like:

    • Date Log - I wrote down what I did that day related to my project

    • Resources - A list of resources I frequently used

    • Case Notes - Notes for the case study including the final insights, what I was looking for, and anything else having to do with the case

  • Evernote to draft this article before I uploaded it here.

 

FINISHED PROJECT

Here is my finished project: Google Capstone Project (V2). You can view the links to my R code on Github used for analysis here and the code for Tableau here.


Note: This is V2 with a few minor changes to the dashboard. Including:

  • Adding horizontal grid lines to a few of the charts

  • Updating the tool tips.

  • Making all of the top metric values (e.g. Total Rides, Average Ride Length, etc.) interactive in Tableau instead of in Figma.


Final dashboard for capstone project
Final Dashboard - Cyclistic Dashboard V2

 

SUMMARY OF DATA


Those who purchase single-ride or full-day passes are referred to as casual riders while those who purchase annual memberships are Cyclistic members.


Data:


Total Rides by User Type

Members had more rides with 2,328,763 total rides or 56% and casual riders had 1,823,376 total rides or 43%.


Total Rides by Rider Type Pie chart

Total Rides per Bike Type

Both casual riders and members used the classic bike the most with 1,777,593 rides or 43% of total rides, followed by docked bikes with 1,545,936 rides or 37% of total rides, and lastly with electric bikes at 828,610 rides or 20% of total rides.


Total Rides per Bike Type - bar chart

Average Ride Length by User Type

The total average ride length was 24 minutes. For casual riders it was longer at 27 minutes while members was 14 minutes.


Average ride length by rider type

Average Ride Length per Weekday

For the average ride length per weekday both casual riders and members had an increase in the average ride length on the weekends. For both Sunday was the longest at 31 minutes.


average ride length per weekday - bar chart

Total Rides by Weekday

Saturday was the most popular weekday combining casual riders and member rides with 784,239 rides or 19% of total rides. But for member rides only Wednesday was the most popular day with 356,060 rides, 5,407 rides more than Saturday.


Total rides by weekday - bar chart

Total Rides by Hour

5PM or 17:00 was the busiest hour for both members and casual riders with 426,685 rides or 10% of the total rides. Typically rides began increasing in the morning at 6AM and rose until 5PM then dropped afterwards. The afternoon was the busiest for both rider types with 1,905,797 rides or 45% of total rides. 4AM was the least popular hour.

Total rides by hour

Total Rides by Month

July was the busiest month combining casual riders and member rides at 691,476 rides or 16% of total rides. While summer was the most popular season for both at 1,903,446 rides or 46% of total rides. Looking at just members August is actually the busiest month with 323,140 rides, 816 rides more than July. Winter is the least popular season and February is the least popular month.

Total bike rides per month - bar chart

Final Summary

  • The most popular bike among with riders was the classic.

  • Busiest time was afternoon and the peak time was at 5PM for both casual riders and members.

  • Busiest weekday was Saturday, casual riders used the service the most on the weekends.

  • Busiest season was Summer for both types of riders.

  • Most rides by User Type was members but casual riders weren't far behind.

  • The average ride length was 24 minutes but casual riders on average rode 23 minutes longer than members.

 

BUSINESS SUGGESTIONS

This was the hardest part for me for the whole project. I have never provided suggestions for a business nor worked in marketing. Any feedback here would be appreciated.


These are my suggestions for the marketing team to convert casual riders to annual members:

  1. Personalize discounts and show perks in the membership program based on their preferences and riding habits.

  2. Emphasize the benefits of memberships, including discounts during busy times of the year like during Summer, or on the weekends.

  3. Have existing members to share their stories about how using Cyclistic's system has changed their life, to create a sense of community, offer a discount if they do so this will help encourage new riders to join the program.

 

WHAT I LEARNED

Below is what I learned/practiced from over 40 hours spent on this project:

  • Pivot Tables in Microsoft Excel

  • Practice using R for data analysis and cleaning specifically using the tidyverse package for data analysis

  • Graphs in Tableau, edited visual elements along with creating different charts and filters.

  • Design elements of an effective dashboard

  • Combining the design feature of Figma with the functionality of Tableau

 

RESOURCES


© Kelly J. Adams - 2024

  • LinkedIn
  • GitHub
bottom of page