Suppr超能文献

基于不平衡数据和自动化机器学习框架的高危驾驶员识别

Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework.

机构信息

Key Laboratory of Road and Traffic Engineering of the State Ministry of Education, College of Transportation Engineering, Tongji University, Shanghai 201804, China.

出版信息

Int J Environ Res Public Health. 2021 Jul 15;18(14):7534. doi: 10.3390/ijerph18147534.

Abstract

Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset sampling or cost-sensitive learning, this paper proposes a novel automated machine learning framework that simultaneously and automatically searches for the optimal sampling, cost-sensitive loss function, and probability calibration to handle class-imbalance problem in recognition of risky drivers. The hyperparameters that control sampling ratio and class weight, along with other hyperparameters, are optimized by Bayesian optimization. To demonstrate the performance of the proposed automated learning framework, we establish a risky driver recognition model as a case study, using video-extracted vehicle trajectory data of 2427 private cars on a German highway. Based on rear-end collision risk evaluation, only 4.29% of all drivers are labeled as risky drivers. The inputs of the recognition model are the discrete Fourier transform coefficients of target vehicle's longitudinal speed, lateral speed, and the gap between the target vehicle and its preceding vehicle. Among 12 sampling methods, 2 cost-sensitive loss functions, and 2 probability calibration methods, the result of automated machine learning is consistent with manual searching but much more computation-efficient. We find that the combination of Support Vector Machine-based Synthetic Minority Oversampling TEchnique (SVMSMOTE) sampling, cost-sensitive cross-entropy loss function, and isotonic regression can significantly improve the recognition ability and reduce the error of predicted probability.

摘要

在交通事故发生之前识别高危驾驶员对于交通事故的控制和预防是必要的。由于驾驶数据的不平衡性质,高危样本作为少数类通常会受到标准分类算法的不当处理。本文提出了一种新的自动化机器学习框架,该框架可以同时自动搜索最佳的采样、代价敏感的损失函数和概率校准,以解决识别高危驾驶员的不平衡问题。控制采样比例和类权重的超参数以及其他超参数通过贝叶斯优化进行优化。为了展示所提出的自动化学习框架的性能,我们建立了一个高危驾驶员识别模型作为案例研究,使用德国高速公路上 2427 辆私家车的视频提取车辆轨迹数据。基于追尾碰撞风险评估,只有 4.29%的驾驶员被标记为高危驾驶员。识别模型的输入是目标车辆纵向速度、横向速度和目标车辆与前车之间的间隙的离散傅里叶变换系数。在 12 种采样方法、2 种代价敏感损失函数和 2 种概率校准方法中,自动化机器学习的结果与手动搜索一致,但计算效率更高。我们发现,基于支持向量机的合成少数类过采样技术(SVMSMOTE)采样、代价敏感交叉熵损失函数和等度回归的组合可以显著提高识别能力并降低预测概率的误差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc7a/8305749/ade4678a506d/ijerph-18-07534-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验