School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore.
Sensors (Basel). 2022 Dec 25;23(1):204. doi: 10.3390/s23010204.
Even with the ubiquitous sensing data in intelligent transportation systems, such as the mobile sensing of vehicle trajectories, traffic estimation is still faced with the data missing problem due to the detector faults or limited number of probe vehicles as mobile sensors. Such data missing issue poses an obstacle for many further explorations, e.g., the link-based traffic status modeling. Although many studies have focused on tackling this kind of problem, existing studies mainly focus on the situation in which data are missing at random and ignore the distinction between links of missing data. In the practical scenario, traffic speed data are always missing not at random (MNAR). The distinction for recovering missing data on different links has not been studied yet. In this paper, we propose a general linear model based on probabilistic principal component analysis (PPCA) for solving MNAR traffic speed data imputation. Furthermore, we propose a metric, i.e., Pearson score (p-score), for distinguishing links and investigate how the model performs on links with different p-score values. Experimental results show that the new model outperforms the typically used PPCA model, and missing data on links with higher p-score values can be better recovered.
即使在智能交通系统中存在无处不在的传感数据,例如车辆轨迹的移动传感,由于检测器故障或作为移动传感器的探针车辆数量有限,交通估计仍然面临数据缺失问题。这种数据缺失问题为许多进一步的探索设置了障碍,例如基于链路的交通状态建模。尽管许多研究都集中在解决这类问题上,但现有研究主要集中在数据随机缺失的情况下,忽略了缺失数据链路之间的区别。在实际场景中,交通速度数据总是非随机缺失(MNAR)。对于不同链路的缺失数据的恢复区别尚未进行研究。在本文中,我们提出了一种基于概率主成分分析(PPCA)的广义线性模型,用于解决 MNAR 交通速度数据插补问题。此外,我们提出了一种度量标准,即 Pearson 得分(p 得分),用于区分链路,并研究模型在具有不同 p 得分值的链路中的表现。实验结果表明,新模型优于常用的 PPCA 模型,并且可以更好地恢复具有更高 p 得分值的链路的缺失数据。