通过主动生成过采样解决碰撞风险预测中的数据不平衡问题。

Addressing data imbalance in collision risk prediction with active generative oversampling.

作者信息

Li Li, Zhang Xiaoliang

机构信息

Information Engineering School, Jiaozuo Normal College, Jiaozuo, 454000, China.

出版信息

Sci Rep. 2025 Mar 17;15(1):9133. doi: 10.1038/s41598-025-93851-3.

DOI:10.1038/s41598-025-93851-3

PMID:40097620

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11914271/

Abstract

Data imbalance is a critical factor affecting the predictive accuracy in collision risk assessment. This study proposes an advanced active generative oversampling method based on Query by Committee (QBC) and Auxiliary Classifier Generative Adversarial Network (ACGAN), integrated with the Wasserstein Generative Adversarial Network (WGAN) framework. Our method selectively enriches minority class samples through QBC and diversity metrics to enhance the diversity of sample generation, thereby improving the performance of fault classification algorithms. By equating the labels of selected samples to those of real samples, we increase the accuracy of the discriminator, forcing the generator to produce more diverse outputs, which is expected to improve classification results. We also propose a method for dynamically adjusting the training epochs of the generator and discriminator based on loss differences to achieve balance in model training. Empirical analysis on four publicly available imbalanced datasets shows that our method outperforms existing methods in terms of precision, recall, F-measure, and G-mean. Specifically, our method's results are above 0.92 on all evaluation indicators, with an average improvement of 23-28.3% compared to the worst-performing ENN method. This indicates that our method has a significant advantage in handling data imbalance, being able to more accurately identify collision samples and reduce the misclassification rate of non-collision samples.

摘要

数据不平衡是影响碰撞风险评估预测准确性的关键因素。本研究提出了一种基于委员会查询（QBC）和辅助分类器生成对抗网络（ACGAN）的先进主动生成过采样方法，并与瓦瑟斯坦生成对抗网络（WGAN）框架相结合。我们的方法通过QBC和多样性度量有选择地丰富少数类样本，以增强样本生成的多样性，从而提高故障分类算法的性能。通过将所选样本的标签与真实样本的标签等同起来，我们提高了判别器的准确性，迫使生成器产生更多样化的输出，这有望改善分类结果。我们还提出了一种基于损失差异动态调整生成器和判别器训练轮次的方法，以实现模型训练的平衡。对四个公开可用的不平衡数据集的实证分析表明，我们的方法在精度、召回率、F值和G均值方面优于现有方法。具体而言，我们的方法在所有评估指标上的结果均高于0.92，与表现最差的ENN方法相比，平均提高了23 - 28.3%。这表明我们的方法在处理数据不平衡方面具有显著优势，能够更准确地识别碰撞样本并降低非碰撞样本的误分类率。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过主动生成过采样解决碰撞风险预测中的数据不平衡问题。

Addressing data imbalance in collision risk prediction with active generative oversampling.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

通过主动生成过采样解决碰撞风险预测中的数据不平衡问题。

Addressing data imbalance in collision risk prediction with active generative oversampling.

作者信息

机构信息

出版信息

相似文献

本文引用的文献