About me
Minimally-Congested Travel Time Prediction from Sparse Open Data - Minimally-Congested Travel Time Prediction from Sparse Open Data
Travel time prediction underpins many transport planning processes and research problems. It is key to understanding urban accessibility, transport mode and route choices, and location decisions. Predicting congested travel times (i.e., when traffic congestion impedes flows) requires real-time data on traffic conditions. Such data are proprietary, expensive, and beyond the reach of many planning practitioners and scholars. But even uncongested travel time prediction requires a detailed model of the street network, speed limits, turn penalties, and traffic signals and stop signs. Though imperfect under congested conditions, such predictions offer far more accurate travel times and shortest paths than the more-common alternatives like minimizing distance traveled or minimizing edge max speed traversal time.
This study presents a solution to this challenge through a machine learning model for more-accurate uncongested travel time prediction. We model the street network of Los Angeles County from OpenStreetMap and sample >40,000 empirical real-world trips as origin-destination (OD) pairs, all from open data. Using open-source software, we solve each OD pair’s shortest path minimizing freeflow traversal time (i.e., the quotient of graph edge length and speed limit) and count the number of traffic controls (traffic signal, stop sign, and crossing) and turns (left, slight left, right, slight right, and u-turn) along the path. This provides an initial “naïve travel time” prediction. Next we calculate uncongested travel times via the Google Routes API at 03:00 (when traffic congestion is minimal), as the empirical gold standard (this represents a one-off, free, but proprietary data collection). Then we train a random forest regression model to predict Google travel times using our naïve times, turn counts, and traffic control counts along the path as features.
Our results show a substantial improvement over traditional, naïve methods of travel time prediction from free open data. For example, our predicted times and the Google times have a difference-in-means of just 0.38 seconds (p=0.75), explaining 93% of the variance in the latter. In comparison, the initial naïve times and the Google times have a difference-in-means of 182.7 seconds (p