Regression Models for Predicting Rail Transit Ridership at the Station Level



Hartig, Daniel

Journal Title

Journal ISSN

Volume Title



Methods for predicting ridership for future urban rail systems or extensions often have poor accuracy. One study shows that predicted ridership is overestimated by about 50%, on average, for a broad sample of urban rail systems worldwide. The ridership estimates produced by most transit agencies in the United States are not based on regression models. This thesis presents a framework for feature generation and regression modeling for estimating urban rail ridership in the United States. Features are generated using publicly available data from the US Census Bureau at the zip code level. Monte Carlo geographic sampling from zip code shapefiles generates features for each station on a rail network, representing characteristics within walking distance of that station. Network connections and travel times are used to generate a second set of features representing characteristics within commuting distance of each station. Several models are developed using different regression types and are compared in terms of accuracy and selected features. Some of the generated models provide system-level ridership predictions within 20% of the true value for a sample set of six US urban rail systems.



Regression, Urban transit, Feature generation, Monte Carlo