Authors: Ke Jiang, Hansen Feng, Wenglei Wu, Yifan Liu

Editor: Wenglei Wu

Created on Nov 24, 2024; last modified on Nov 24, 2024.

Note: this post is a summarized version of the corresponding research paper, including only the selected content.

Introduction

Shared biking is gaining its popularity around the world. However, it is difficult to balance between supply and demand. Our goal was to develop models to predict bike station demand within a given time interval. We observed that past studies only focus on using the public dataset or system data from bike sharing companies. We argue that the cycling demand is dependent not only on past trip records, but also on the geological characteristics of bike stations, the temporal characteristics when the trip takes place, and the weather conditions. Therefore, we also extended the Capital Bikeshare dataset by integrating data from multiple sources.

Main Contribution

  1. Our OLAP data cube collects Washington, DC data from multiple sources and extends the Capital Bikeshare system data, which can be reused by future researchers to create any type of dataset of interest.
  2. We visualize the cycling patterns in Washington, DC from different perspectives.
  3. We observed several issues in the data that may hinder or inspire the model development process, both for our study and for future researchers who use our dataset.
  4. We evaluate and compare the performance of different models predicting the outflow of bike stations in Washington, DC. Models: linear regression, fully connected network, ARIMA, LSTM+Attention, XGBoost.

Data Source and Summary

Table Source # Records Interval
Trip Capital Bikeshare Trip History Data 35,204,419 /
Station Capital Bikeshare Trip History Data, Open Data DC 741 /
Weather OpenWeatherMap 116,616 1 Hour
Time Python holidays Package 1,472,544 5 Minutes

conceptual-design.jpg

Visualization

Temporal

Hourly Average Trip Count - Working Day v.s. Weekend and Holiday