I moved to Columbus, Ohio in the fall of 2018 for work. One of the things that I really like about Columbus is the CoGo bike sharing network. CoGo currently has 597 bikes and 72 stations spread across the city. Recently it was announced that the network will be expanding in 2020 with additional stations and new e-bikes.
I bought an annual subscription shortly after I moved here. There’s a station close to my apartment and my work so it makes it easy for commuting. It’s also great because I hate driving and it makes it easy to get around the city without having to worry about parking.
If you have a membership CoGo makes it easy to get all your trip data from their website. (They also share real-time data and historical trip data of all riders. Data can be found here.) So I downloaded copied and pasted my personal data from their website to do some basic analysis on my riding habits.
Bike Riding Analysis
# load the necessary libraries
library(tidyverse)
library(janitor)
Reading in the data
# read in my riding data
cogo_data <- read_csv("https://raw.githubusercontent.com/trevinflick/blog/master/my_cogo_data.csv")
I also used Google Maps to get the coordinates of all the bike stations that I used.
# read in the station lookup table
cogo_lu <- read_csv("https://raw.githubusercontent.com/trevinflick/blog/master/station_lookup.csv")
# take a look at the data
glimpse(cogo_data)
## Observations: 214
## Variables: 6
## $ `Trip ID` <dbl> 411343, 411344, 411358, 411374, 411382, 411418, 41144…
## $ `Start Station` <chr> "Easton Square Pl & Townsfair Way", "Seward St & Wort…
## $ `Start Time` <chr> "1/5/20 11:11", "1/5/20 12:07", "1/6/20 8:06", "1/6/2…
## $ `End Station` <chr> "Seward St & Worth Ave", "Easton Square Pl & Townsfai…
## $ `End Time` <chr> "1/5/20 11:19", "1/5/20 12:13", "1/6/20 8:12", "1/6/2…
## $ Duration <chr> "7m 28s", "5m 34s", "6m 30s", "4m 59s", "5m 52s", "5m…
glimpse(cogo_lu)
## Observations: 28
## Variables: 3
## $ STATION <chr> "3rd St & Sycamore St", "4th St & Rich St", "Bicentennial Par…
## $ LAT <dbl> 39.94889, 39.95773, 39.95596, 39.95754, 39.97135, 40.04882, 3…
## $ LON <dbl> -82.99519, -82.99536, -83.00313, -82.99853, -83.00217, -82.91…
Data cleaning
Now we’ll clean up the riding data so it’s easier to analyze.
# clean up the column names with the janitor package
cogo_data <- clean_names(cogo_data)
cogo_lu <- clean_names(cogo_lu)
# separate date and time into two columns
cogo_data <- cogo_data %>%
separate(col = start_time, into = c("start_date","start_time"), sep = " ")
cogo_data <- cogo_data %>%
separate(col = end_time, into = c("end_date","end_time"), sep = " ")
# create a column for trip time in seconds
cogo_data <- cogo_data %>%
mutate(minutes = as.numeric(str_extract_all(cogo_data$duration, "[0-9]+", simplify = TRUE)[,1]),
seconds = as.numeric(str_extract_all(cogo_data$duration, "[0-9]+", simplify = TRUE)[,2]),
trip_seconds = minutes * 60 + seconds)
head(cogo_data)
## # A tibble: 6 x 11
## trip_id start_station start_date start_time end_station end_date end_time
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 411343 Easton Squar… 1/5/20 11:11 Seward St … 1/5/20 11:19
## 2 411344 Seward St & … 1/5/20 12:07 Easton Squ… 1/5/20 12:13
## 3 411358 Lucas St & T… 1/6/20 8:06 Front St &… 1/6/20 8:12
## 4 411374 Front St & B… 1/6/20 11:52 Lucas St &… 1/6/20 11:57
## 5 411382 Lucas St & T… 1/6/20 12:57 Front St &… 1/6/20 13:03
## 6 411418 Front St & B… 1/6/20 16:37 Lucas St &… 1/6/20 16:43
## # … with 4 more variables: duration <chr>, minutes <dbl>, seconds <dbl>,
## # trip_seconds <dbl>
Initial Analysis
Okay, now that we’ve cleaned up some of the columns we can dive into the data. Let’s take a look at my total number of trips.
cogo_data %>%
group_by(trip_id) %>%
nrow()
## [1] 214
I took 214 trips as a CoGo member, but I want to know what time frame we’re looking at. In order to work with dates, we’ll load the lubridate package.
library(lubridate)
cogo_data$start_date <- mdy(cogo_data$start_date)
cogo_data$end_date <- mdy(cogo_data$end_date)
cogo_data %>%
summarise(first_day = min(start_date),
last_day = max(start_date))
## # A tibble: 1 x 2
## first_day last_day
## <date> <date>
## 1 2018-11-29 2020-01-07
So it looks like I began riding on November, 29th 2018 and my last ride was on January, 7th 2020. Now I want to count the number of trips by year.
cogo_data %>%
count(year(start_date))
## # A tibble: 3 x 2
## `year(start_date)` n
## <dbl> <int>
## 1 2018 11
## 2 2019 193
## 3 2020 10
2019 Analysis
I took the bulk of my trips in 2019, so for the rest of the analysis we’ll focus on that year.
cogo_2019 <- cogo_data %>%
filter(year(start_date) == 2019)
Plotting!
Let’s take a look at what days of the week and what months I tend to use CoGo.
cogo_2019$day <- weekdays(cogo_2019$start_date)
library(ggthemes) # for a pretty graph
days <- c("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday")
cogo_2019 %>%
ggplot(aes(x = day)) +
geom_bar() +
scale_x_discrete(limits = days) +
labs(x = "",
y = "",
title = "Most rides in 2019 occured on Saturday") +
theme_fivethirtyeight()
start_month <- c("January","February","March","April","May","June","July","August","September",
"October","November","December")
cogo_2019 %>%
ggplot(aes(x = month(start_date))) +
geom_bar() +
scale_x_discrete(limits = start_month) +
labs(x = "",
y = "",
title = "May was the most popular month for 2019") +
theme_fivethirtyeight(base_size = 9)
It makes sense that most of my rides occured in the spring and summer during the warmer months. It’s a little surprising that I rode 10 times in January (more than February and March combined).
Next I want to see how long I typically ride during each trip. For this exercise I’ll take a look at the minutes column. (Since I’m not including the seconds column we’ll have to think of the minutes as bins: 0-59 seconds for 0 minutes, 1:00-1:59 for 1 minute, 2:00-2:59 for 2 minutes, etc.)
cogo_2019 %>%
ggplot(aes(x = minutes)) +
geom_bar() +
scale_x_continuous(breaks = seq(0,30,by=5)) +
labs(x = "Trip duration in minutes",
y = "",
title = "Most trips are around 6 to 7 minutes") +
theme_light()
A majority of my trips are under 10 minutes. With a CoGo membership you get unlimited trips under 30 minutes and any ride over 30 minutes you have to pay extra. In 2019 I had one trip over 30 minutes. Let’s take a look at this one a little closer.
cogo_2019 %>%
filter(minutes > 30)
## # A tibble: 1 x 12
## trip_id start_station start_date start_time end_station end_date end_time
## <dbl> <chr> <date> <chr> <chr> <date> <chr>
## 1 383502 Lane Ave at … 2019-06-29 10:13 North Bank… 2019-06-29 10:51
## # … with 5 more variables: duration <chr>, minutes <dbl>, seconds <dbl>,
## # trip_seconds <dbl>, day <chr>
Station Location Analysis
This trip lasted 37 minutes. I remember it because I ran 5 miles with my friend to Ohio State’s campus and then I biked back on the CoGo while he ran the rest of the way back (I’m a great friend).
So we looked at when I typically ride and how long the trips usually take, now I want to look into where I ride.
# top 10 starting stations
cogo_2019 %>%
group_by(start_station) %>%
count() %>%
arrange(desc(n)) %>%
head(10)
## # A tibble: 10 x 2
## # Groups: start_station [10]
## start_station n
## <chr> <int>
## 1 Lucas St & Town St 81
## 2 Front St & Beck St 51
## 3 Neil Ave & Nationwide Blvd 17
## 4 Summit St & 17th Ave 8
## 5 3rd St & Sycamore St 7
## 6 High St & Warren 3
## 7 Schiller Park - Stewart Ave 3
## 8 4th St & Rich St 2
## 9 Bicentennial Park 2
## 10 Columbus Commons - Rich St 2
# top 10 ending stations
cogo_2019 %>%
group_by(end_station) %>%
count() %>%
arrange(desc(n)) %>%
head(10)
## # A tibble: 10 x 2
## # Groups: end_station [10]
## end_station n
## <chr> <int>
## 1 Lucas St & Town St 92
## 2 Front St & Beck St 40
## 3 Neil Ave & Nationwide Blvd 14
## 4 3rd St & Sycamore St 8
## 5 Summit St & 17th Ave 8
## 6 High St & Warren 5
## 7 Bicentennial Park 4
## 8 Nationwide Arena - Front St 3
## 9 Schiller Park - Stewart Ave 3
## 10 Topiary Park - Town St 3
# top 10 A to B trips
cogo_2019 %>%
group_by(start_station, end_station) %>%
count() %>%
arrange(desc(n)) %>%
head(10)
## # A tibble: 10 x 3
## # Groups: start_station, end_station [10]
## start_station end_station n
## <chr> <chr> <int>
## 1 Front St & Beck St Lucas St & Town St 46
## 2 Lucas St & Town St Front St & Beck St 38
## 3 Neil Ave & Nationwide Blvd Lucas St & Town St 15
## 4 Lucas St & Town St Neil Ave & Nationwide Blvd 11
## 5 Lucas St & Town St 3rd St & Sycamore St 7
## 6 Lucas St & Town St Summit St & 17th Ave 7
## 7 3rd St & Sycamore St Lucas St & Town St 6
## 8 Summit St & 17th Ave Lucas St & Town St 5
## 9 Lucas St & Town St High St & Warren 3
## 10 Lucas St & Town St Nationwide Arena - Front St 3
Here are my most popular starting and ending stations, as well as my most popular routes. My most popular stations are Front St & Beck St and Lucas St & Town St. This makes sense since these stations are closest to my apartment and to my work. This combination also happens to be my two most popular routes. The Neil & Nationwide station is my third most popular destination. I really liked to utilize this station for going to baseball games or concerts.
Mapping!
library(ggmap)
cogo_start_19 <- left_join(cogo_2019, cogo_lu, by = c("start_station" = "station"))
cogo_end_19 <- left_join(cogo_2019, cogo_lu, by = c("end_station" = "station"))
A map of my starting locations in 2019.
qmplot(lon, lat, data = cogo_start_19, maptype = "toner-background", geom = c("point","density2d"),
color = I("red"))
A map of all my ending stations in 2019.
qmplot(lon, lat, data = cogo_end_19, maptype = "toner-background", geom = c("point","density2d"),
color = I("red"))
Fin
Okay, that’s it for now. If you’re new to R hopefully was useful to you and you learned something along the way. Here are some awesome additional R resources to check out:
Another useful resource and community is Tidy Tuesday. It’s a great community of R learners that analyze a new data set every week. Check out the link and #TidyTuesday on Twitter.
Keep your eye out for more content on this site!