Google Capstone Project: Cyclistic Bike Share π΄
Introduction
As part of the Google Data Analytics Professional Certificate, this case study was chosen for my Capstone Project. It will consist of the following processes as taught in the programme; Ask, Prepare, Process, Analyse, Share and Act. Due to the large nature of the dataset, R was used in order to clean, manipulate, analyse and visualise. Tableau was used for creating the visualisations, you can access my dashboard here!
Scenario
Based in Chicago, Cyclistic is a fictional bike share company. Lily Moreno, the marketing director, believes maximising the number of annual memberships will lead to the companyβs future success. Hence, the data analytics team wants to understand how casual riders and annual member riders use Cyclistic bikes differently. From these insights, a new marketing strategy will be designed to convert casual riders into member riders.
About the company
Launched in 2016, Cyclistic features more than 5,800 geo-tracked bicycles and almost 700 docking stations. Cyclistic sets itself apart by offering reclining bikes, hand tricycles and cargo bikes, ensuring bike share is more inclusive to people with disabilities. These bikes can be unlocked at one station and returned to another at anytime.
The company has flexible pricing plans: single ride passes, full day passes, and annual memberships. Casual riders are those who purchase single ride or full day passes whilst Member riders are those who purchase annual memberships.
Cyclisticβs finance analysts have concluded that member riders are more profitable than casual riders. Although the pricing flexibility helps the company attract more customers, Moreno believes that maximising the number of member riders will be key to future growth. She also believes there is a very good chance to convert casual riders into member riders, instead of focusing on creating a marketing campaign that targets all new customers.
Business task
Analyse the companyβs historical bike trip data to identify trends into how casual riders and member riders use Cyclisticβs bikes differently.
Stakeholders
- Lily Moreno: Cyclisticβs marketing director, responsible for the development of campaigns and initiatives to promote the bike share programme.
- Cyclisticβs marketing analytics team: A team of data analysts who are responsible for collecting, analysing and reporting data that helps guide the marketing strategy.
- Cyclisticβs executive team: Responsible for deciding whether to approve the recommended marketing programme or not.
Ask
The following three questions will guide the analysis:
- How do casual riders and member riders use Cyclistic bikes differently?
- Why would casual riders buy a Cyclistic annual membership?
- How can Cyclistic use digital media to influence casual riders to become member riders?
Prepare
The data used in the project was made readily available by Motivate International Inc. under this licence. The datasets are named different as Cyclistic is fictional company. Divvy, the name on the files, is a real bike share system operating in Chicago with similar numbers of bicycles and docking stations. Therefore, the data is appropriate and will allow us to explore how both customer types are using Cyclistic bikes.
Data organisation
Twelve CSV files consisting of the data sets between β202110-divvy-tripdata.zipβ and β202209-divvy-tripdata.zipβ were downloaded. This data corresponds to the period between November 2021 and October 2022. Various data can be found on the data sets such as available bike types for rental, the date and time of each rental bike and different start and finish stations can be found, alongside more information. The data set can be found here.
Due to data privacy issues, it is imperative to note that it prevents us to have access to the riderβs personally identifiable information. This means that it will not be possible to connect pass purchases to credit card numbers to determine whether casual riders are from the Cyclistic service area, or if they have purchased many single ride passes.
Data integrity
The data was collected first hand by Cyclistic, thus making it trusted. This then makes the data free from bias and would reflect the overall population. Moreover, the data is consistent as itβs provided in comma separated format (CSV) files and organised into rows and columns.
Data security
As encryption allows riderβs personally identifiable information to be protected, a copy of the downloaded files is backed up.
Load required R packages
library(tidyverse)
library(data.table)
library(readr)
library(lubridate)
library(janitor)
library(skimr)
library(dplyr)
library(tidyr)
Import and load the data sets
trip_1 <- read_csv("Documents/Cyclist_Bike_Share /Nov_Trip_Data.csv")
trip_2 <- read_csv("Documents/Cyclist_Bike_Share /Dec_Trip_Data.csv")
trip_3 <- read_csv("Documents/Cyclist_Bike_Share /Jan_Trip_Data.csv")
trip_4 <- read_csv("Documents/Cyclist_Bike_Share /Feb_Trip_Data.csv")
trip_5 <- read_csv("Documents/Cyclist_Bike_Share /Mar_Trip_Data.csv")
trip_6 <- read_csv("Documents/Cyclist_Bike_Share /Apr_Trip_Data.csv")
trip_7 <- read_csv("Documents/Cyclist_Bike_Share /May_Trip_Data.csv")
trip_8 <- read_csv("Documents/Cyclist_Bike_Share /Jun_Trip_Data.csv")
trip_9 <- read_csv("Documents/Cyclist_Bike_Share /Jul_Trip_Data.csv")
trip_10 <- read_csv("Documents/Cyclist_Bike_Share /Aug_Trip_Data.csv")
trip_11 <- read_csv("Documents/Cyclist_Bike_Share /Sep_Trip_Data.csv")
trip_12 <- read_csv("Documents/Cyclist_Bike_Share /Oct_Trip_Data.csv")
Console output from loading data sets
> trip_1 <- read_csv("Documents/Cyclist_Bike_Share /Nov_Trip_Data.csv")
Rows: 359978 Columns: 13
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, member_casual
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> trip_2 <- read_csv("Documents/Cyclist_Bike_Share /Dec_Trip_Data.csv")
Rows: 247540 Columns: 13
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, member_casual
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> trip_3 <- read_csv("Documents/Cyclist_Bike_Share /Jan_Trip_Data.csv")
Rows: 103770 Columns: 13
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, member_casual
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> trip_4 <- read_csv("Documents/Cyclist_Bike_Share /Feb_Trip_Data.csv")
Rows: 115609 Columns: 13
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, member_casual
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> trip_5 <- read_csv("Documents/Cyclist_Bike_Share /Mar_Trip_Data.csv")
Rows: 284042 Columns: 13
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, member_casual
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> trip_6 <- read_csv("Documents/Cyclist_Bike_Share /Apr_Trip_Data.csv")
Rows: 371249 Columns: 13
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, member_casual
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> trip_7 <- read_csv("Documents/Cyclist_Bike_Share /May_Trip_Data.csv")
Rows: 634858 Columns: 13
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, member_casual
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> trip_8 <- read_csv("Documents/Cyclist_Bike_Share /Jun_Trip_Data.csv")
Rows: 769204 Columns: 13
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, member_casual
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> trip_9 <- read_csv("Documents/Cyclist_Bike_Share /Jul_Trip_Data.csv")
Rows: 823488 Columns: 13
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, member_casual
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> trip_10 <- read_csv("Documents/Cyclist_Bike_Share /Aug_Trip_Data.csv")
Rows: 785932 Columns: 13
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, member_casual
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> trip_11 <- read_csv("Documents/Cyclist_Bike_Share /Sep_Trip_Data.csv")
Rows: 701339 Columns: 13
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, member_casual
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> trip_12 <- read_csv("Documents/Cyclist_Bike_Share /Oct_Trip_Data.csv")
Rows: 558685 Columns: 13
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_station_name, end_station_id, member_casual
dbl (4): start_lat, start_lng, end_lat, end_lng
dttm (2): started_at, ended_at
Merge all data sets into a single data frame
all_trips <-
rbind(trip_1, trip_2, trip_3, trip_4, trip_5, trip_6, trip_7, trip_8, trip_9, trip_10, trip_11, trip_12)
Process
In order to process the data sets, R programming language will be used as aforementioned. In this section, the data will get cleaned and manipulated for further analysis.
Identify and view data set using functions
head(all_trips)
Console output
# A tibble: 6 Γ 13
**ride_id** **rideable_type started_at ended_at start_station_name start_station_id end_station_name end_station_id start_lat start_lng end_lat end_lng membeβ¦ΒΉ**
<chr> <chr> <dttm> <dttm> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 7C00A93E10556E47 electric_bike 2021-11-27 13:27:38 2021-11-27 13:46:38 NA NA NA NA 41.9 -87.7 42.0 -87.7 casual
2 90854840DFD508BA electric_bike 2021-11-27 13:38:25 2021-11-27 13:56:10 NA NA NA NA 42.0 -87.7 41.9 -87.7 casual
3 0A7D10CDD144061C electric_bike 2021-11-26 22:03:34 2021-11-26 22:05:56 NA NA NA NA 42.0 -87.7 42.0 -87.7 casual
4 2F3BE33085BCFF02 electric_bike 2021-11-27 09:56:49 2021-11-27 10:01:50 NA NA NA NA 41.9 -87.8 41.9 -87.8 casual
5 D67B4781A19928D4 electric_bike 2021-11-26 19:09:28 2021-11-26 19:30:41 NA NA NA NA 41.9 -87.6 41.9 -87.6 casual
6 02F85C2C3C5F7D46 electric_bike 2021-11-26 18:34:07 2021-11-26 18:52:49 Michigan Ave & Oak St 13042 NA NA 41.9 -87.6 41.9 -87.6 casual
# β¦ with abbreviated variable name ΒΉmember_casu
tail(all_trips)
Console output
# A tibble: 6 Γ 13
**ride_id rideable_type started_at ended_at start_station_name start_station_id end_station_name end_sβ¦ΒΉ startβ¦Β² startβ¦Β³ end_lat end_lng membeβ¦β΄**
<chr> <chr> <dttm> <dttm> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 DA551F0A9C0DBCA2 classic_bike 2022-10-24 17:45:38 2022-10-24 17:48:02 Sedgwick St & North Ave TA1307000038 Lincoln Ave & Roscoe ⦠chargi⦠41.9 -87.6 41.9 -87.7 member
2 BC3BFA659C9AB6F1 classic_bike 2022-10-30 01:41:29 2022-10-30 01:57:16 Clifton Ave & Armitage Ave TA1307000163 Lincoln Ave & Roscoe ⦠chargi⦠41.9 -87.7 41.9 -87.7 casual
3 ACD65450291CF95F classic_bike 2022-10-30 01:41:54 2022-10-30 01:57:09 Clifton Ave & Armitage Ave TA1307000163 Lincoln Ave & Roscoe ⦠chargi⦠41.9 -87.7 41.9 -87.7 casual
4 4AAC03D1438E97CA classic_bike 2022-10-15 09:34:11 2022-10-15 10:03:21 Sedgwick St & North Ave TA1307000038 Wabash Ave & Grand Ave TA1307β¦ 41.9 -87.6 41.9 -87.6 casual
5 8E6F3F29785E5D40 classic_bike 2022-10-09 10:21:34 2022-10-09 10:43:45 Sedgwick St & North Ave TA1307000038 Damen Ave & Clybourn β¦ 13271 41.9 -87.6 41.9 -87.7 member
6 8D14CBE672431920 docked_bike 2022-10-22 13:17:13 2022-10-22 13:46:14 Clark St & Armitage Ave 13146 Wabash Ave & Grand Ave TA1307β¦ 41.9 -87.6 41.9 -87.6 casual
# β¦ with abbreviated variable names ΒΉend_station_id, Β²start_lat, Β³start_lng, β΄member_casua
glimpse(all_trips)
Console output
Rows: 5,755,694
Columns: 13
$ ride_id <chr> "7C00A93E10556E47", "90854840DFD508BA", "0A7D10CDD144061C", "2F3BE33085BCFF02", "D67B4781A19928D4", "02F85C2C3C5F7D46", "EF780B807EF7835A", "17069CC749126036"β¦
$ rideable_type <chr> "electric_bike", "electric_bike", "electric_bike", "electric_bike", "electric_bike", "electric_bike", "electric_bike", "electric_bike", "electric_bike", "elecβ¦
$ started_at <dttm> 2021-11-27 13:27:38, 2021-11-27 13:38:25, 2021-11-26 22:03:34, 2021-11-27 09:56:49, 2021-11-26 19:09:28, 2021-11-26 18:34:07, 2021-11-27 13:31:12, 2021-11-27β¦
$ ended_at <dttm> 2021-11-27 13:46:38, 2021-11-27 13:56:10, 2021-11-26 22:05:56, 2021-11-27 10:01:50, 2021-11-26 19:30:41, 2021-11-26 18:52:49, 2021-11-27 13:37:12, 2021-11-27β¦
$ start_station_name <chr> NA, NA, NA, NA, NA, "Michigan Ave & Oak St", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, Nβ¦
$ start_station_id <chr> NA, NA, NA, NA, NA, "13042", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, Nβ¦
$ end_station_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Clark St β¦
$ end_station_id <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "TA1309000β¦
$ start_lat <dbl> 41.93000, 41.96000, 41.96000, 41.94000, 41.90000, 41.90086, 41.81000, 41.95000, 41.80000, 41.78000, 41.91000, 41.92000, 41.92000, 41.92000, 41.90000, 41.78000β¦
$ start_lng <dbl> -87.72000, -87.70000, -87.70000, -87.79000, -87.63000, -87.62379, -87.60000, -87.66000, -87.60000, -87.60000, -87.72000, -87.71000, -87.71000, -87.78000, -87.β¦
$ end_lat <dbl> 41.96000, 41.92000, 41.96000, 41.93000, 41.88000, 41.90000, 41.80000, 41.95000, 41.79000, 41.80000, 41.92000, 41.97000, 41.91000, 41.92000, 41.88000, 41.80000β¦
$ end_lng <dbl> -87.73000, -87.70000, -87.70000, -87.79000, -87.62000, -87.63000, -87.60000, -87.66000, -87.60000, -87.59000, -87.71000, -87.68000, -87.72000, -87.78000, -87.β¦
$ member_casual <chr> "casual", "casual", "casual", "casual", "casual", "casual", "casual", "casual", "casual", "casual", "casual", "casual", "casual", "casual", "casual", "casual"β¦
colnames(all_trips)
Console output
[1] "ride_id" [2] "rideable_type" [3] "started_at" [4] "ended_at" [5] "start_station_name". [6] "start_station_id" [7] "end_station_name" [8] "end_station_id"
[9] "start_lat" [10] "start_lng" [11] "end_lat" [12] "end_lng" [13] "member_casual"
nrow(all_trips)
Console output
5755694
dim(all_trips)
Console output
[1] 5755694 [2] 13
summary(all_trips)
Console output
ride_id rideable_type started_at ended_at start_station_name start_station_id end_station_name end_station_id
Length:5755694 Length:5755694 Min. :2021-11-01 00:00:14.00 Min. :2021-11-01 00:04:06.00 Length:5755694 Length:5755694 Length:5755694 Length:5755694
Class :character Class :character 1st Qu.:2022-04-27 16:40:09.00 1st Qu.:2022-04-27 16:51:40.25 Class :character Class :character Class :character Class :character
Mode :character Mode :character Median :2022-06-30 18:31:03.00 Median :2022-06-30 18:49:28.00 Mode :character Mode :character Mode :character Mode :character
Mean :2022-06-13 23:04:32.59 Mean :2022-06-13 23:23:58.99
3rd Qu.:2022-08-24 19:52:19.75 3rd Qu.:2022-08-24 20:10:05.75
Max. :2022-10-31 23:59:33.00 Max. :2022-11-07 04:53:58.00
start_lat start_lng end_lat end_lng member_casual
Min. :41.64 Min. :-87.84 Min. :41.39 Min. :-88.97 Length:5755694
1st Qu.:41.88 1st Qu.:-87.66 1st Qu.:41.88 1st Qu.:-87.66 Class :character
Median :41.90 Median :-87.64 Median :41.90 Median :-87.64 Mode :character
Mean :41.90 Mean :-87.65 Mean :41.90 Mean :-87.65
3rd Qu.:41.93 3rd Qu.:-87.63 3rd Qu.:41.93 3rd Qu.:-87.63
Max. :45.64 Max. :-73.80 Max. :42.37 Max. :-87.30
NA's :5835 NA's :5835
str(all_trips)
Console output
spc_tbl_ [5,755,694 Γ 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ ride_id : chr [1:5755694] "7C00A93E10556E47" "90854840DFD508BA" "0A7D10CDD144061C" "2F3BE33085BCFF02" ...
$ rideable_type : chr [1:5755694] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
$ started_at : POSIXct[1:5755694], format: "2021-11-27 13:27:38" "2021-11-27 13:38:25" "2021-11-26 22:03:34" "2021-11-27 09:56:49" ...
$ ended_at : POSIXct[1:5755694], format: "2021-11-27 13:46:38" "2021-11-27 13:56:10" "2021-11-26 22:05:56" "2021-11-27 10:01:50" ...
$ start_station_name: chr [1:5755694] NA NA NA NA ...
$ start_station_id : chr [1:5755694] NA NA NA NA ...
$ end_station_name : chr [1:5755694] NA NA NA NA ...
$ end_station_id : chr [1:5755694] NA NA NA NA ...
$ start_lat : num [1:5755694] 41.9 42 42 41.9 41.9 ...
$ start_lng : num [1:5755694] -87.7 -87.7 -87.7 -87.8 -87.6 ...
$ end_lat : num [1:5755694] 42 41.9 42 41.9 41.9 ...
$ end_lng : num [1:5755694] -87.7 -87.7 -87.7 -87.8 -87.6 ...
$ member_casual : chr [1:5755694] "casual" "casual" "casual" "casual" ...
- attr(*, "spec")=
.. cols(
.. ride_id = col_character(),
.. rideable_type = col_character(),
.. started_at = col_datetime(format = ""),
.. ended_at = col_datetime(format = ""),
.. start_station_name = col_character(),
.. start_station_id = col_character(),
.. end_station_name = col_character(),
.. end_station_id = col_character(),
.. start_lat = col_double(),
.. start_lng = col_double(),
.. end_lat = col_double(),
.. end_lng = col_double(),
.. member_casual = col_character()
.. )
- attr(*, "problems")=<externalptr>
Data cleaning
The data set had multiple null values which needed to be sorted and filtered. This was achieved by running the following code.
Remove N/A values
is.na(all_trips)
colSums(is.na(all_trips))
Console output
**ride_id rideable_type started_at ended_at start_station_name start_station_id end_station_name end_station_id start_lat start_lng end_lat end_lng member_casual**
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
[ reached getOption("max.print") -- omitted 5755618 rows ]
> colSums(is.na(all_trips))
ride_id rideable_type started_at ended_at start_station_name start_station_id end_station_name end_station_id start_lat
0 0 0 0 878177 878177 940010 940010 0
start_lng end_lat end_lng member_casual
0 5835 5835 0
all_trips <- all_trips[rowSums(is.na(all_trips))==0,]
is.na(all_trips)
Console output
**ride_id rideable_type started_at ended_at start_station_name start_station_id end_station_name end_station_id start_lat start_lng end_lat end_lng member_casual**
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[ reached getOption("max.print") -- omitted 4410362 rows ]
glimpse(all_trips)
Console output
Rows: 4,410,438
Columns: 13
$ ride_id <chr> "4CA9676997DAFFF6", "F3E84A230AF2D676", "A1F2C92308007968", "9B871C3B14E9BEC4", "2A81E957DD24A3DC", "A57F8F504A01004E", "C79A70DFD07E75D9", "41E7F8FB5640B521"β¦
$ rideable_type <chr> "classic_bike", "classic_bike", "electric_bike", "classic_bike", "classic_bike", "electric_bike", "electric_bike", "electric_bike", "electric_bike", "electricβ¦
$ started_at <dttm> 2021-11-26 10:27:28, 2021-11-15 09:35:03, 2021-11-10 16:27:02, 2021-11-09 19:51:36, 2021-11-06 19:14:10, 2021-11-18 11:58:24, 2021-11-23 22:14:11, 2021-11-05β¦
$ ended_at <dttm> 2021-11-26 11:22:13, 2021-11-15 09:42:08, 2021-11-10 17:04:28, 2021-11-09 20:11:17, 2021-11-06 19:33:19, 2021-11-18 12:08:35, 2021-11-23 22:44:01, 2021-11-05β¦
$ start_station_name <chr> "Michigan Ave & Oak St", "Clark St & Grace St", "Leamington Ave & Hirsch St", "Desplaines St & Kinzie St", "Larrabee St & Armitage Ave", "Michigan Ave & Oak Sβ¦
$ start_station_id <chr> "13042", "TA1307000127", "307", "TA1306000003", "TA1309000006", "13042", "604", "TA1307000127", "TA1306000003", "KA1503000043", "TA1306000003", "KA1503000043"β¦
$ end_station_name <chr> "Michigan Ave & Oak St", "Clark St & Leland Ave", "Leamington Ave & Hirsch St", "Desplaines St & Kinzie St", "Michigan Ave & Oak St", "Michigan Ave & Oak St",β¦
$ end_station_id <chr> "13042", "TA1309000014", "307", "TA1306000003", "13042", "13042", "604", "TA1309000014", "TA1305000006", "TA1306000003", "KA1503000043", "KA1503000043", "TA13β¦
$ start_lat <dbl> 41.90096, 41.95078, 41.91000, 41.88872, 41.91808, 41.90114, 42.05820, 41.95086, 41.88863, 41.88915, 41.88872, 41.88918, 41.88918, 41.88132, 41.90096, 41.95097β¦
$ start_lng <dbl> -87.62378, -87.65917, -87.75000, -87.64445, -87.64375, -87.62353, -87.67755, -87.65913, -87.64437, -87.63849, -87.64445, -87.63851, -87.63851, -87.62952, -87.β¦
$ end_lat <dbl> 41.90096, 41.96710, 41.91000, 41.88872, 41.90096, 41.90100, 42.05825, 41.96707, 41.88160, 41.88872, 41.88918, 41.88918, 41.88132, 41.88132, 41.90096, 41.96712β¦
$ end_lng <dbl> -87.62378, -87.66743, -87.75000, -87.64445, -87.62378, -87.62375, -87.67752, -87.66745, -87.63013, -87.64427, -87.63851, -87.63851, -87.62952, -87.62952, -87.β¦
$ member_casual <chr> "casual", "casual", "casual", "casual", "casual", "casual", "casual", "member", "member", "member", "member", "member", "member", "member", "casual", "member"β¦
The null values have now been viewed and removed correspondingly. This will then enable ease for data manipulation.
Data manipulation
At this stage the data was altered for easier use and to make calculations.
Create βride_distanceβ column to calculate rider distances
all_trips <- all_trips %>%
mutate(ride_distance = distHaversine(cbind(start_lng, start_lat),cbind(end_lng,end_lat)))
head(all_trips)
Console output
# A tibble: 6 Γ 14
**ride_id rideable_type started_at ended_at start_station_name start_station_id end_station_β¦ΒΉ end_sβ¦Β² startβ¦Β³ startβ¦β΄ end_lat end_lng membeβ¦β΅ ride_β¦βΆ**
<chr> <chr> <dttm> <dttm> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 4CA9676997DAFFF6 classic_bike 2021-11-26 10:27:28 2021-11-26 11:22:13 Michigan Ave & Oak St 13042 Michigan Ave β¦ 13042 41.9 -87.6 41.9 -87.6 casual 0
2 F3E84A230AF2D676 classic_bike 2021-11-15 09:35:03 2021-11-15 09:42:08 Clark St & Grace St TA1307000127 Clark St & Le⦠TA1309⦠42.0 -87.7 42.0 -87.7 casual 1941.
3 A1F2C92308007968 electric_bike 2021-11-10 16:27:02 2021-11-10 17:04:28 Leamington Ave & Hirsch St 307 Leamington Av⦠307 41.9 -87.8 41.9 -87.8 casual 0
4 9B871C3B14E9BEC4 classic_bike 2021-11-09 19:51:36 2021-11-09 20:11:17 Desplaines St & Kinzie St TA1306000003 Desplaines St⦠TA1306⦠41.9 -87.6 41.9 -87.6 casual 0
5 2A81E957DD24A3DC classic_bike 2021-11-06 19:14:10 2021-11-06 19:33:19 Larrabee St & Armitage Ave TA1309000006 Michigan Ave β¦ 13042 41.9 -87.6 41.9 -87.6 casual 2524.
6 A57F8F504A01004E electric_bike 2021-11-18 11:58:24 2021-11-18 12:08:35 Michigan Ave & Oak St 13042 Michigan Ave β¦ 13042 41.9 -87.6 41.9 -87.6 casual 23.7
# β¦ with abbreviated variable names ΒΉend_station_name, Β²end_station_id, Β³start_lat, β΄start_lng, β΅member_casual, βΆride_distance
Change member_casual column to βusertypeβ
all_trips <- all_trips %>%
rename(usertype = member_casual)
colnames(all_trips)
Console output
[1] "ride_id" [2] "rideable_type" [3] "started_at" [4] "ended_at" [5] "start_station_name" [6] "start_station_id" [7] "end_station_name" [8] "end_station_id"
[9] "start_lat". [10] "start_lng" [11] "end_lat" [12] "end_lng" [13] "usertype" [14] "ride_distance"
table(all_trips$usertype)
Console output
casual member
1768182 2642256
head(all_trips)
Console output
# A tibble: 6 Γ 14
**ride_id rideable_type started_at ended_at start_station_name start_station_id end_station_β¦ΒΉ end_sβ¦Β² startβ¦Β³ startβ¦β΄ end_lat end_lng usertβ¦β΅ ride_β¦βΆ**
<chr> <chr> <dttm> <dttm> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 4CA9676997DAFFF6 classic_bike 2021-11-26 10:27:28 2021-11-26 11:22:13 Michigan Ave & Oak St 13042 Michigan Ave β¦ 13042 41.9 -87.6 41.9 -87.6 casual 0
2 F3E84A230AF2D676 classic_bike 2021-11-15 09:35:03 2021-11-15 09:42:08 Clark St & Grace St TA1307000127 Clark St & Le⦠TA1309⦠42.0 -87.7 42.0 -87.7 casual 1941.
3 A1F2C92308007968 electric_bike 2021-11-10 16:27:02 2021-11-10 17:04:28 Leamington Ave & Hirsch St 307 Leamington Av⦠307 41.9 -87.8 41.9 -87.8 casual 0
4 9B871C3B14E9BEC4 classic_bike 2021-11-09 19:51:36 2021-11-09 20:11:17 Desplaines St & Kinzie St TA1306000003 Desplaines St⦠TA1306⦠41.9 -87.6 41.9 -87.6 casual 0
5 2A81E957DD24A3DC classic_bike 2021-11-06 19:14:10 2021-11-06 19:33:19 Larrabee St & Armitage Ave TA1309000006 Michigan Ave β¦ 13042 41.9 -87.6 41.9 -87.6 casual 2524.
6 A57F8F504A01004E electric_bike 2021-11-18 11:58:24 2021-11-18 12:08:35 Michigan Ave & Oak St 13042 Michigan Ave β¦ 13042 41.9 -87.6 41.9 -87.6 casual 23.7
# β¦ with abbreviated variable names ΒΉend_station_name, Β²end_station_id, Β³start_lat, β΄start_lng, β΅usertype, βΆride_distance
tail(all_trips)
Console output
# A tibble: 6 Γ 14
**ride_id rideable_type started_at ended_at start_station_name start_station_id end_station_β¦ΒΉ end_sβ¦Β² startβ¦Β³ startβ¦β΄ end_lat end_lng usertβ¦β΅ ride_β¦βΆ**
<chr> <chr> <dttm> <dttm> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 DA551F0A9C0DBCA2 classic_bike 2022-10-24 17:45:38 2022-10-24 17:48:02 Sedgwick St & North Ave TA1307000038 Lincoln Ave &⦠chargi⦠41.9 -87.6 41.9 -87.7 member 4436.
2 BC3BFA659C9AB6F1 classic_bike 2022-10-30 01:41:29 2022-10-30 01:57:16 Clifton Ave & Armitage Ave TA1307000163 Lincoln Ave &⦠chargi⦠41.9 -87.7 41.9 -87.7 casual 3020.
3 ACD65450291CF95F classic_bike 2022-10-30 01:41:54 2022-10-30 01:57:09 Clifton Ave & Armitage Ave TA1307000163 Lincoln Ave &⦠chargi⦠41.9 -87.7 41.9 -87.7 casual 3020.
4 4AAC03D1438E97CA classic_bike 2022-10-15 09:34:11 2022-10-15 10:03:21 Sedgwick St & North Ave TA1307000038 Wabash Ave & β¦ TA1307β¦ 41.9 -87.6 41.9 -87.6 casual 2427.
5 8E6F3F29785E5D40 classic_bike 2022-10-09 10:21:34 2022-10-09 10:43:45 Sedgwick St & North Ave TA1307000038 Damen Ave & C⦠13271 41.9 -87.6 41.9 -87.7 member 3970.
6 8D14CBE672431920 docked_bike 2022-10-22 13:17:13 2022-10-22 13:46:14 Clark St & Armitage Ave 13146 Wabash Ave & β¦ TA1307β¦ 41.9 -87.6 41.9 -87.6 casual 3090.
# β¦ with abbreviated variable names ΒΉend_station_name, Β²end_station_id, Β³start_lat, β΄start_lng, β΅usertype, βΆride_distance
dim(all_trips)
Console output
[1] 4410438 [2] 14
Calculate ride data for each day, month and year. Also create a βweekdayβ column
all_trips$date <- as.Date(all_trips$started_at)
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$weekday <- format(as.Date(all_trips$date), "%A")
head(all_trips)
Console output
# A tibble: 6 Γ 19
**ride_id rideable_β¦ΒΉ started_at ended_at startβ¦Β² startβ¦Β³ end_sβ¦β΄ end_sβ¦β΅ startβ¦βΆ startβ¦β· end_lat end_lng usertβ¦βΈ ride_β¦βΉ date month day year weekday**
<chr> <chr> <dttm> <dttm> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <date> <chr> <chr> <chr> <chr>
1 4CA9676997DAFFF6 classic_bi⦠2021-11-26 10:27:28 2021-11-26 11:22:13 Michig⦠13042 Michig⦠13042 41.9 -87.6 41.9 -87.6 casual 0 2021-11-26 11 26 2021 Friday
2 F3E84A230AF2D676 classic_bi⦠2021-11-15 09:35:03 2021-11-15 09:42:08 Clark ⦠TA1307⦠Clark ⦠TA1309⦠42.0 -87.7 42.0 -87.7 casual 1941. 2021-11-15 11 15 2021 Monday
3 A1F2C92308007968 electric_bβ¦ 2021-11-10 16:27:02 2021-11-10 17:04:28 Leaminβ¦ 307 Leaminβ¦ 307 41.9 -87.8 41.9 -87.8 casual 0 2021-11-10 11 10 2021 Wednesβ¦
4 9B871C3B14E9BEC4 classic_bi⦠2021-11-09 19:51:36 2021-11-09 20:11:17 Despla⦠TA1306⦠Despla⦠TA1306⦠41.9 -87.6 41.9 -87.6 casual 0 2021-11-09 11 09 2021 Tuesday
5 2A81E957DD24A3DC classic_biβ¦ 2021-11-06 19:14:10 2021-11-06 19:33:19 Larrabβ¦ TA1309β¦ Michigβ¦ 13042 41.9 -87.6 41.9 -87.6 casual 2524. 2021-11-06 11 06 2021 Saturdβ¦
6 A57F8F504A01004E electric_bβ¦ 2021-11-18 11:58:24 2021-11-18 12:08:35 Michigβ¦ 13042 Michigβ¦ 13042 41.9 -87.6 41.9 -87.6 casual 23.7 2021-11-18 11 18 2021 Thursdβ¦
# β¦ with abbreviated variable names ΒΉrideable_type, Β²start_station_name, Β³start_station_id, β΄end_station_name, β΅end_station_id, βΆstart_lat, β·start_lng, βΈusertype, βΉride_distance
tail(all_trips)
Console output
# A tibble: 6 Γ 19
**ride_id rideable_β¦ΒΉ started_at ended_at startβ¦Β² startβ¦Β³ end_sβ¦β΄ end_sβ¦β΅ startβ¦βΆ startβ¦β· end_lat end_lng usertβ¦βΈ ride_β¦βΉ date month day year weekday**
<chr> <chr> <dttm> <dttm> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <date> <chr> <chr> <chr> <chr>
1 DA551F0A9C0DBCA2 classic_bi⦠2022-10-24 17:45:38 2022-10-24 17:48:02 Sedgwi⦠TA1307⦠Lincol⦠chargi⦠41.9 -87.6 41.9 -87.7 member 4436. 2022-10-24 10 24 2022 Monday
2 BC3BFA659C9AB6F1 classic_bi⦠2022-10-30 01:41:29 2022-10-30 01:57:16 Clifto⦠TA1307⦠Lincol⦠chargi⦠41.9 -87.7 41.9 -87.7 casual 3020. 2022-10-30 10 30 2022 Sunday
3 ACD65450291CF95F classic_bi⦠2022-10-30 01:41:54 2022-10-30 01:57:09 Clifto⦠TA1307⦠Lincol⦠chargi⦠41.9 -87.7 41.9 -87.7 casual 3020. 2022-10-30 10 30 2022 Sunday
4 4AAC03D1438E97CA classic_biβ¦ 2022-10-15 09:34:11 2022-10-15 10:03:21 Sedgwiβ¦ TA1307β¦ Wabashβ¦ TA1307β¦ 41.9 -87.6 41.9 -87.6 casual 2427. 2022-10-15 10 15 2022 Saturdβ¦
5 8E6F3F29785E5D40 classic_bi⦠2022-10-09 10:21:34 2022-10-09 10:43:45 Sedgwi⦠TA1307⦠Damen ⦠13271 41.9 -87.6 41.9 -87.7 member 3970. 2022-10-09 10 09 2022 Sunday
6 8D14CBE672431920 docked_bike 2022-10-22 13:17:13 2022-10-22 13:46:14 Clark β¦ 13146 Wabashβ¦ TA1307β¦ 41.9 -87.6 41.9 -87.6 casual 3090. 2022-10-22 10 22 2022 Saturdβ¦
# β¦ with abbreviated variable names ΒΉrideable_type, Β²start_station_name, Β³start_station_id, β΄end_station_name, β΅end_station_id, βΆstart_lat, β·start_lng, βΈusertype, βΉride_distance
str(all_trips)
Console output
tibble [4,410,438 Γ 19] (S3: tbl_df/tbl/data.frame)
$ ride_id : chr [1:4410438] "4CA9676997DAFFF6" "F3E84A230AF2D676" "A1F2C92308007968" "9B871C3B14E9BEC4" ...
$ rideable_type : chr [1:4410438] "classic_bike" "classic_bike" "electric_bike" "classic_bike" ...
$ started_at : POSIXct[1:4410438], format: "2021-11-26 10:27:28" "2021-11-15 09:35:03" "2021-11-10 16:27:02" "2021-11-09 19:51:36" ...
$ ended_at : POSIXct[1:4410438], format: "2021-11-26 11:22:13" "2021-11-15 09:42:08" "2021-11-10 17:04:28" "2021-11-09 20:11:17" ...
$ start_station_name: chr [1:4410438] "Michigan Ave & Oak St" "Clark St & Grace St" "Leamington Ave & Hirsch St" "Desplaines St & Kinzie St" ...
$ start_station_id : chr [1:4410438] "13042" "TA1307000127" "307" "TA1306000003" ...
$ end_station_name : chr [1:4410438] "Michigan Ave & Oak St" "Clark St & Leland Ave" "Leamington Ave & Hirsch St" "Desplaines St & Kinzie St" ...
$ end_station_id : chr [1:4410438] "13042" "TA1309000014" "307" "TA1306000003" ...
$ start_lat : num [1:4410438] 41.9 42 41.9 41.9 41.9 ...
$ start_lng : num [1:4410438] -87.6 -87.7 -87.8 -87.6 -87.6 ...
$ end_lat : num [1:4410438] 41.9 42 41.9 41.9 41.9 ...
$ end_lng : num [1:4410438] -87.6 -87.7 -87.8 -87.6 -87.6 ...
$ usertype : chr [1:4410438] "casual" "casual" "casual" "casual" ...
$ ride_distance : num [1:4410438] 0 1941 0 0 2524 ...
$ date : Date[1:4410438], format: "2021-11-26" "2021-11-15" "2021-11-10" "2021-11-09" ...
$ month : chr [1:4410438] "11" "11" "11" "11" ...
$ day : chr [1:4410438] "26" "15" "10" "09" ...
$ year : chr [1:4410438] "2021" "2021" "2021" "2021" ...
$ weekday : chr [1:4410438] "Friday" "Monday" "Wednesday" "Tuesday" ...
Calculate trip duration in seconds and store as “ride_lengthβ
all_trips$ride_length <- difftime(all_trips$ended_at, all_trips$started_at)
head(all_trips)
Console output
# A tibble: 6 Γ 20
**ride_id rideaβ¦ΒΉ started_at ended_at startβ¦Β² startβ¦Β³ end_sβ¦β΄ end_sβ¦β΅ startβ¦βΆ startβ¦β· end_lat end_lng usertβ¦βΈ ride_β¦βΉ date month day year weekdayΛ ride_β¦Λ**
<chr> <chr> <dttm> <dttm> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <date> <chr> <chr> <chr> <chr> <drtn>
1 4CA9676997Dβ¦ classiβ¦ 2021-11-26 10:27:28 2021-11-26 11:22:13 Michigβ¦ 13042 Michigβ¦ 13042 41.9 -87.6 41.9 -87.6 casual 0 2021-11-26 11 26 2021 Friday 3285 sβ¦
2 F3E84A230AFβ¦ classiβ¦ 2021-11-15 09:35:03 2021-11-15 09:42:08 Clark β¦ TA1307β¦ Clark β¦ TA1309β¦ 42.0 -87.7 42.0 -87.7 casual 1941. 2021-11-15 11 15 2021 Monday 425 sβ¦
3 A1F2C923080β¦ electrβ¦ 2021-11-10 16:27:02 2021-11-10 17:04:28 Leaminβ¦ 307 Leaminβ¦ 307 41.9 -87.8 41.9 -87.8 casual 0 2021-11-10 11 10 2021 Wednesβ¦ 2246 sβ¦
4 9B871C3B14Eβ¦ classiβ¦ 2021-11-09 19:51:36 2021-11-09 20:11:17 Desplaβ¦ TA1306β¦ Desplaβ¦ TA1306β¦ 41.9 -87.6 41.9 -87.6 casual 0 2021-11-09 11 09 2021 Tuesday 1181 sβ¦
5 2A81E957DD2β¦ classiβ¦ 2021-11-06 19:14:10 2021-11-06 19:33:19 Larrabβ¦ TA1309β¦ Michigβ¦ 13042 41.9 -87.6 41.9 -87.6 casual 2524. 2021-11-06 11 06 2021 Saturdβ¦ 1149 sβ¦
6 A57F8F504A0β¦ electrβ¦ 2021-11-18 11:58:24 2021-11-18 12:08:35 Michigβ¦ 13042 Michigβ¦ 13042 41.9 -87.6 41.9 -87.6 casual 23.7 2021-11-18 11 18 2021 Thursdβ¦ 611 sβ¦
# β¦ with abbreviated variable names ΒΉrideable_type, Β²start_station_name, Β³start_station_id, β΄end_station_name, β΅end_station_id, βΆstart_lat, β·start_lng, βΈusertype, βΉride_distance,
# Λweekday, Λride_length
str(all_trips)
Console output
tibble [4,410,438 Γ 20] (S3: tbl_df/tbl/data.frame)
$ ride_id : chr [1:4410438] "4CA9676997DAFFF6" "F3E84A230AF2D676" "A1F2C92308007968" "9B871C3B14E9BEC4" ...
$ rideable_type : chr [1:4410438] "classic_bike" "classic_bike" "electric_bike" "classic_bike" ...
$ started_at : POSIXct[1:4410438], format: "2021-11-26 10:27:28" "2021-11-15 09:35:03" "2021-11-10 16:27:02" "2021-11-09 19:51:36" ...
$ ended_at : POSIXct[1:4410438], format: "2021-11-26 11:22:13" "2021-11-15 09:42:08" "2021-11-10 17:04:28" "2021-11-09 20:11:17" ...
$ start_station_name: chr [1:4410438] "Michigan Ave & Oak St" "Clark St & Grace St" "Leamington Ave & Hirsch St" "Desplaines St & Kinzie St" ...
$ start_station_id : chr [1:4410438] "13042" "TA1307000127" "307" "TA1306000003" ...
$ end_station_name : chr [1:4410438] "Michigan Ave & Oak St" "Clark St & Leland Ave" "Leamington Ave & Hirsch St" "Desplaines St & Kinzie St" ...
$ end_station_id : chr [1:4410438] "13042" "TA1309000014" "307" "TA1306000003" ...
$ start_lat : num [1:4410438] 41.9 42 41.9 41.9 41.9 ...
$ start_lng : num [1:4410438] -87.6 -87.7 -87.8 -87.6 -87.6 ...
$ end_lat : num [1:4410438] 41.9 42 41.9 41.9 41.9 ...
$ end_lng : num [1:4410438] -87.6 -87.7 -87.8 -87.6 -87.6 ...
$ usertype : chr [1:4410438] "casual" "casual" "casual" "casual" ...
$ ride_distance : num [1:4410438] 0 1941 0 0 2524 ...
$ date : Date[1:4410438], format: "2021-11-26" "2021-11-15" "2021-11-10" "2021-11-09" ...
$ month : chr [1:4410438] "11" "11" "11" "11" ...
$ day : chr [1:4410438] "26" "15" "10" "09" ...
$ year : chr [1:4410438] "2021" "2021" "2021" "2021" ...
$ weekday : chr [1:4410438] "Friday" "Monday" "Wednesday" "Tuesday" ...
$ ride_length : 'difftime' num [1:4410438] 3285 425 2246 1181 ...
..- attr(*, "units")= chr "secs"
Convert “ride_length” from difftime to numeric to run calculations on data set
is.difftime(all_trips$ride_length)
Console output
TRUE
all_trips$ride_length <- as.numeric(as.difftime(all_trips$ride_length))
is.numeric(all_trips$ride_length)
Console output
TRUE
str(all_trips)
Console output
tibble [4,410,438 Γ 20] (S3: tbl_df/tbl/data.frame)
$ ride_id : chr [1:4410438] "4CA9676997DAFFF6" "F3E84A230AF2D676" "A1F2C92308007968" "9B871C3B14E9BEC4" ...
$ rideable_type : chr [1:4410438] "classic_bike" "classic_bike" "electric_bike" "classic_bike" ...
$ started_at : POSIXct[1:4410438], format: "2021-11-26 10:27:28" "2021-11-15 09:35:03" "2021-11-10 16:27:02" "2021-11-09 19:51:36" ...
$ ended_at : POSIXct[1:4410438], format: "2021-11-26 11:22:13" "2021-11-15 09:42:08" "2021-11-10 17:04:28" "2021-11-09 20:11:17" ...
$ start_station_name: chr [1:4410438] "Michigan Ave & Oak St" "Clark St & Grace St" "Leamington Ave & Hirsch St" "Desplaines St & Kinzie St" ...
$ start_station_id : chr [1:4410438] "13042" "TA1307000127" "307" "TA1306000003" ...
$ end_station_name : chr [1:4410438] "Michigan Ave & Oak St" "Clark St & Leland Ave" "Leamington Ave & Hirsch St" "Desplaines St & Kinzie St" ...
$ end_station_id : chr [1:4410438] "13042" "TA1309000014" "307" "TA1306000003" ...
$ start_lat : num [1:4410438] 41.9 42 41.9 41.9 41.9 ...
$ start_lng : num [1:4410438] -87.6 -87.7 -87.8 -87.6 -87.6 ...
$ end_lat : num [1:4410438] 41.9 42 41.9 41.9 41.9 ...
$ end_lng : num [1:4410438] -87.6 -87.7 -87.8 -87.6 -87.6 ...
$ usertype : chr [1:4410438] "casual" "casual" "casual" "casual" ...
$ ride_distance : num [1:4410438] 0 1941 0 0 2524 ...
$ date : Date[1:4410438], format: "2021-11-26" "2021-11-15" "2021-11-10" "2021-11-09" ...
$ month : chr [1:4410438] "11" "11" "11" "11" ...
$ day : chr [1:4410438] "26" "15" "10" "09" ...
$ year : chr [1:4410438] "2021" "2021" "2021" "2021" ...
$ weekday : chr [1:4410438] "Friday" "Monday" "Wednesday" "Tuesday" ...
$ ride_length : num [1:4410438] 3285 425 2246 1181 1149 ...
Remove rides that are less than 0 seconds and 0 metres
all_trips <- all_trips %>%
+ filter(ride_length > 0) %>%
+ filter(ride_distance > 0)
head(all_trips)
Console output
# A tibble: 6 Γ 20
**ride_id rideaβ¦ΒΉ started_at ended_at startβ¦Β² startβ¦Β³ end_sβ¦β΄ end_sβ¦β΅ startβ¦βΆ startβ¦β· end_lat end_lng usertβ¦βΈ ride_β¦βΉ date month day year weekdayΛ ride_β¦Λ**
<chr> <chr> <dttm> <dttm> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <date> <chr> <chr> <chr> <chr> <dbl>
1 F3E84A230AF⦠classi⦠2021-11-15 09:35:03 2021-11-15 09:42:08 Clark ⦠TA1307⦠Clark ⦠TA1309⦠42.0 -87.7 42.0 -87.7 casual 1941. 2021-11-15 11 15 2021 Monday 425
2 2A81E957DD2⦠classi⦠2021-11-06 19:14:10 2021-11-06 19:33:19 Larrab⦠TA1309⦠Michig⦠13042 41.9 -87.6 41.9 -87.6 casual 2524. 2021-11-06 11 06 2021 Saturd⦠1149
3 A57F8F504A0⦠electr⦠2021-11-18 11:58:24 2021-11-18 12:08:35 Michig⦠13042 Michig⦠13042 41.9 -87.6 41.9 -87.6 casual 23.7 2021-11-18 11 18 2021 Thursd⦠611
4 C79A70DFD07⦠electr⦠2021-11-23 22:14:11 2021-11-23 22:44:01 Sherid⦠604 Sherid⦠604 42.1 -87.7 42.1 -87.7 casual 6.63 2021-11-23 11 23 2021 Tuesday 1790
5 41E7F8FB564⦠electr⦠2021-11-05 16:48:10 2021-11-05 16:53:18 Clark ⦠TA1307⦠Clark ⦠TA1309⦠42.0 -87.7 42.0 -87.7 member 1932. 2021-11-05 11 05 2021 Friday 308
6 3DEAAA1701B⦠electr⦠2021-11-18 08:40:38 2021-11-18 08:48:56 Despla⦠TA1306⦠Dearbo⦠TA1305⦠41.9 -87.6 41.9 -87.6 member 1416. 2021-11-18 11 18 2021 Thursd⦠498
# β¦ with abbreviated variable names ΒΉrideable_type, Β²start_station_name, Β³start_station_id, β΄end_station_name, β΅end_station_id, βΆstart_lat, β·start_lng, βΈusertype, βΉride_distance,
# Λweekday, Λride_length
dim(all_trips)
Console output
[1] 4207351 [2] 20
Analyse
This section will cover the calculations performed, trends and relationships identified and a conduction of descriptive analysis on the data set.
Descriptive ride length (in seconds) analysis
summary(all_trips$ride_length)
Console output
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.0 371 636 992.6 1114 2061244
Summary statistics to compare ride length of casual and member riders
aggregate(all_trips$ride_length ~ all_trips$usertype, FUN = mean)
all_trips$usertype all_trips$ride_length
aggregate(all_trips$ride_length ~ all_trips$usertype, FUN = median)
all_trips$usertype all_trips$ride_length
aggregate(all_trips$ride_length ~ all_trips$usertype, FUN = max)
all_trips$usertype all_trips$ride_length
aggregate(all_trips$ride_length ~ all_trips$usertype, FUN = min)
all_trips$usertype all_trips$ride_length
Console output
1 casual 1384.6729
2 member 741.3947
1 casual 826
2 member 541
1 casual 2061244
2 member 89575
1 casual 1
2 member 1
Summary statistics to compare ride distance of casual and member riders
aggregate(all_trips$ride_distance ~ all_trips$usertype, FUN = mean)
all_trips$usertype all_trips$ride_distance
aggregate(all_trips$ride_distance ~ all_trips$usertype, FUN = median)
all_trips$usertype all_trips$ride_distance
aggregate(all_trips$ride_distance ~ all_trips$usertype, FUN = max)
all_trips$usertype all_trips$ride_distance
aggregate(all_trips$ride_distance ~ all_trips$usertype, FUN = min)
all_trips$usertype all_trips$ride_distance
Console output
1 casual 2323.760
2 member 2111.424
1 casual 1780.596
2 member 1548.659
1 casual 1190854.54
2 member 27518.42
1 casual 0.03321865
2 member 0.02016286
Average ride length by weekday for casual and member riders
aggregate (all_trips$ride_length ~ all_trips$usertype + all_trips$weekday, FUN = mean)
all_trips$usertype all_trips$weekday all_trips$ride_length
Console output
1 casual Friday 1298.8332
2 member Friday 727.8972
3 casual Monday 1413.7970
4 member Monday 712.3448
5 casual Saturday 1550.9885
6 member Saturday 838.9709
7 casual Sunday 1563.3295
8 member Sunday 825.7307
9 casual Thursday 1242.5742
10 member Thursday 715.6762
11 casual Tuesday 1231.2016
12 member Tuesday 700.8877
13 casual Wednesday 1197.2032
14 member Wednesday 704.7405
Average ride distance by weekday for casual and member riders
aggregate (all_trips$ride_distance ~ all_trips$usertype + all_trips$weekday, FUN = mean)
all_trips$usertype all_trips$weekday all_trips$ride_distance
Console output
1 casual Friday 2273.075
2 member Friday 2066.173
3 casual Monday 2258.196
4 member Monday 2067.225
5 casual Saturday 2462.863
6 member Saturday 2230.627
7 casual Sunday 2418.339
8 member Sunday 2184.793
9 casual Thursday 2256.727
10 member Thursday 2096.145
11 casual Tuesday 2227.397
12 member Tuesday 2076.137
13 casual Wednesday 2226.972
14 member Wednesday 2091.090
Correct the weekday order
all_trips$weekday <- ordered(all_trips$weekday, levels=c ("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday" ))
Re-run the previous two average codes for correct weekday order
aggregate (all_trips$ride_length ~ all_trips$usertype + all_trips$weekday, FUN = mean)
all_trips$usertype all_trips$weekday all_trips$ride_length
Console output
1 casual Monday 1413.7970
2 member Monday 712.3448
3 casual Tuesday 1231.2016
4 member Tuesday 700.8877
5 casual Wednesday 1197.2032
6 member Wednesday 704.7405
7 casual Thursday 1242.5742
8 member Thursday 715.6762
9 casual Friday 1298.8332
10 member Friday 727.8972
11 casual Saturday 1550.9885
12 member Saturday 838.9709
13 casual Sunday 1563.3295
14 member Sunday 825.7307
aggregate (all_trips$ride_distance ~ all_trips$usertype + all_trips$weekday, FUN = mean)
all_trips$usertype all_trips$weekday all_trips$ride_distance
Console output
1 casual Monday 2258.196
2 member Monday 2067.225
3 casual Tuesday 2227.397
4 member Tuesday 2076.137
5 casual Wednesday 2226.972
6 member Wednesday 2091.090
7 casual Thursday 2256.727
8 member Thursday 2096.145
9 casual Friday 2273.075
10 member Friday 2066.173
11 casual Saturday 2462.863
12 member Saturday 2230.627
13 casual Sunday 2418.339
14 member Sunday 2184.793
head(all_trips)
Console output
# A tibble: 6 Γ 21
**ride_id rideaβ¦ΒΉ started_at ended_at startβ¦Β² startβ¦Β³ end_sβ¦β΄ end_sβ¦β΅ startβ¦βΆ startβ¦β· end_lat end_lng usertβ¦βΈ ride_β¦βΉ date month day year weekdayΛ ride_β¦Λ**
<chr> <chr> <dttm> <dttm> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <date> <chr> <chr> <chr> <chr> <dbl>
1 F3E84A230AF⦠classi⦠2021-11-15 09:35:03 2021-11-15 09:42:08 Clark ⦠TA1307⦠Clark ⦠TA1309⦠42.0 -87.7 42.0 -87.7 casual 1941. 2021-11-15 11 15 2021 Monday 425
2 2A81E957DD2⦠classi⦠2021-11-06 19:14:10 2021-11-06 19:33:19 Larrab⦠TA1309⦠Michig⦠13042 41.9 -87.6 41.9 -87.6 casual 2524. 2021-11-06 11 06 2021 Saturd⦠1149
3 A57F8F504A0⦠electr⦠2021-11-18 11:58:24 2021-11-18 12:08:35 Michig⦠13042 Michig⦠13042 41.9 -87.6 41.9 -87.6 casual 23.7 2021-11-18 11 18 2021 Thursd⦠611
4 C79A70DFD07⦠electr⦠2021-11-23 22:14:11 2021-11-23 22:44:01 Sherid⦠604 Sherid⦠604 42.1 -87.7 42.1 -87.7 casual 6.63 2021-11-23 11 23 2021 Tuesday 1790
5 41E7F8FB564⦠electr⦠2021-11-05 16:48:10 2021-11-05 16:53:18 Clark ⦠TA1307⦠Clark ⦠TA1309⦠42.0 -87.7 42.0 -87.7 member 1932. 2021-11-05 11 05 2021 Friday 308
6 3DEAAA1701B⦠electr⦠2021-11-18 08:40:38 2021-11-18 08:48:56 Despla⦠TA1306⦠Dearbo⦠TA1305⦠41.9 -87.6 41.9 -87.6 member 1416. 2021-11-18 11 18 2021 Thursd⦠498
# β¦ with 1 more variable: weekday <ord>, and abbreviated variable names ΒΉrideable_type, Β²start_station_name, Β³start_station_id, β΄end_station_name, β΅end_station_id, βΆstart_lat,
# β·start_lng, βΈusertype, βΉride_distance, Λweekday, Λride_length
Analyse rider data by user type and weekday
all_trips %>%
+ mutate(weekday = wday(started_at)) %>% # Weekday field created using wday()
+ group_by (usertype, weekday) %>% # Usertype and weekday grouped
+ summarise(number_of_rides = n(), # Calculates number of rides
+ average_duration = mean(ride_length)) %>% # Calculates the average duration
+ arrange(usertype, weekday) # Sorts the data
Console output
`summarise()` has grouped output by 'usertype'. You can override using the `.groups` argument.
# A tibble: 14 Γ 4
# Groups: usertype [2]
**usertype weekday number_of_rides average_duration**
<chr> <dbl> <int> <dbl>
1 casual 1 280364 1563.
2 casual 2 197346 1414.
3 casual 3 182229 1231.
4 casual 4 189679 1197.
5 casual 5 211725 1243.
6 casual 6 233947 1299.
7 casual 7 347877 1551.
8 member 1 291628 826.
9 member 2 374390 712.
10 member 3 402895 701.
11 member 4 404541 705.
12 member 5 402490 716.
13 member 6 354495 728.
14 member 7 333745 839.
Analysed data set
all_trips_analysed <- all_trips %>%
+ group_by(usertype, weekday) %>%
+ summarise(number_of_rides = n(), average_duration = mean(ride_length), average_distance = mean(ride_distance)) %>%
+ arrange(usertype, weekday)
Console output
`summarise()` has grouped output by 'usertype'. You can override using the `.groups` argument.
> as_tibble(all_trips_analysed)
# A tibble: 14 Γ 5
**usertype weekday number_of_rides average_duration average_distance**
<chr> <ord> <int> <dbl> <dbl>
1 casual Monday 197346 1414. 2258.
2 casual Tuesday 182229 1231. 2227.
3 casual Wednesday 189679 1197. 2227.
4 casual Thursday 211725 1243. 2257.
5 casual Friday 233947 1299. 2273.
6 casual Saturday 347877 1551. 2463.
7 casual Sunday 280364 1563. 2418.
8 member Monday 374390 712. 2067.
9 member Tuesday 402895 701. 2076.
10 member Wednesday 404541 705. 2091.
11 member Thursday 402490 716. 2096.
12 member Friday 354495 728. 2066.
13 member Saturday 333745 839. 2231.
14 member Sunday 291628 826. 2185.
This data set provides valuable insight whereby member riders ride more on the weekdays, specifically on Wednesday. Whilst casual riders prefer to ride mostly during the weekends.
Share
This section is all about the information that has been discovered and to discuss the findings to help the stakeholders make an accurate informed decision. To visualise this, Tableau was used to create the diagrams and charts below. The dashboard can be viewed here.
User type distribution
- Over 50% of Cyclisticβs usertype are member riders
- Significant amount of customer loyalty
Bike preference
- Classic bikes are significantly dominated by member riders
- Docked bikes used by casual riders
- Similar usage of electric bikes by both user types
- Casual riders prefer classic bikes more
Popular riding days for both user types
- Member riders ride more on weekdays, notably peaking on Tuesdays
- Casual riders seen to be riding more on weekends
- Easily inferred that member riders use bikes for work whilst casual riders are for more recreational and occasional purposes
Popular riding months for both user types
- Bike usage increases after February and drops after July for casual riders
- Member riders are more consistent with their riders throughout the year
- Less frequent bike use for causal riders during winter months
Average ride length by weekday
- Casual riders ride almost twice as much more than member riders
- Saturday and Sunday have the highest ride length for both user types, particularly for casual riders
- Lower ride lengths by member riders due to work commute
Total number of rides per hour of the day
- Members riding hours usually peak at 7am - 9am and 5pm
- Casual riding hours gradually increases as the day goes on and dips after 5pm
Act
This section will cover the recommendations based on the discovered information and findings to the stakeholders. In order to be able to convert Casual riders into Cyclistic members:
- Increase marketing campaigns that are targeted at casual riders through in app notifications, texts and emails with exclusive member rider sign up discounts and benefits. These advertisements would then encourage switching.
- Increase single day and full-day pass prices to incentivise member rider sign-ups. With this being done, discounted member rider prices may attract casual riders to convert.
- Build a rewards system whereby points are added to member riders accounts each time they ride a certain distance. With increased ride frequency, points can be collected and then be exchanged to get a discount for the yearly subscription.
- Social media platforms should be utilised in order to cover all of the benefits member rider memberships have to offer.
- Offer a seasonal membership that allows people to ride bikes during peak seasons such as spring and summer. This membership would allow riders to have a sense of choice with no long term commitment for seasons where they would not ride. This membership can be priced strategically so that it is better to purchase the membership rather than paying for single day and full day passes for a year.
No further recommendations were made due to a lack in personally identifiable information, such as age, gender, payment and locational information. For Cyclistic to further meet its financial and marketing goals, in the foreseeable future it can focus on customising advertisements for their members based on hobbies, interests and ride usage.