Xian Tingsen Tim, Chin Teck Kean, Marks Benjy, Nelson John D, Moylan Emily
The University of Sydney, School of Civil Engineering, NSW, 2006, Australia.
The University of Sydney, Institute of Transport and Logistics Studies, NSW, 2006, Australia.
Sci Data. 2024 Sep 27;11(1):1034. doi: 10.1038/s41597-024-03873-1.
Public Transport Authorities generate large quantities of data as part of their daily operations, including vehicle positions, arrival times, and dynamic routing options. This information is essential feedback into the system for planning routes and timetables, modelling passenger demand growth and evaluating operational performance. The General Transit Feed Specification (GTFS) is a data format that allows public transport data to be consumed by a wide variety of software applications. There are barriers to widespread consumption of GTFS data related to location-specific data extensions, non-human-readable formats and error cleaning. This paper describes a flexible dataset of actual bus arrivals and departure times created with a pipeline for GTFS realtime feeds designed to address these challenges. The paper describes the pipeline, verifies the quality of the data and presents an output of 25 months of actual bus arrival and departure times for Sydney, Australia. We conclude by discussing relevance to researchers and practitioners of the pipeline outputs in general and the sample data specifically.
公共交通部门在日常运营中会生成大量数据,包括车辆位置、到达时间和动态路线选择。这些信息对于规划路线和时间表、模拟乘客需求增长以及评估运营绩效至关重要,是系统的重要反馈。通用公交数据规范(GTFS)是一种数据格式,可使各种软件应用程序使用公共交通数据。在GTFS数据的广泛使用方面存在一些障碍,这些障碍与特定位置的数据扩展、非人类可读格式以及错误清理有关。本文描述了一个灵活的数据集,该数据集包含实际公交到站和发车时间,是通过一个用于GTFS实时馈送的管道创建的,旨在应对这些挑战。本文介绍了该管道,验证了数据质量,并给出了澳大利亚悉尼25个月的实际公交到站和发车时间的输出结果。最后,我们讨论了该管道输出结果,特别是示例数据,对研究人员和从业者的相关性。