Suppr超能文献

通过主动生成过采样解决碰撞风险预测中的数据不平衡问题。

Addressing data imbalance in collision risk prediction with active generative oversampling.

作者信息

Li Li, Zhang Xiaoliang

机构信息

Information Engineering School, Jiaozuo Normal College, Jiaozuo, 454000, China.

出版信息

Sci Rep. 2025 Mar 17;15(1):9133. doi: 10.1038/s41598-025-93851-3.

Abstract

Data imbalance is a critical factor affecting the predictive accuracy in collision risk assessment. This study proposes an advanced active generative oversampling method based on Query by Committee (QBC) and Auxiliary Classifier Generative Adversarial Network (ACGAN), integrated with the Wasserstein Generative Adversarial Network (WGAN) framework. Our method selectively enriches minority class samples through QBC and diversity metrics to enhance the diversity of sample generation, thereby improving the performance of fault classification algorithms. By equating the labels of selected samples to those of real samples, we increase the accuracy of the discriminator, forcing the generator to produce more diverse outputs, which is expected to improve classification results. We also propose a method for dynamically adjusting the training epochs of the generator and discriminator based on loss differences to achieve balance in model training. Empirical analysis on four publicly available imbalanced datasets shows that our method outperforms existing methods in terms of precision, recall, F-measure, and G-mean. Specifically, our method's results are above 0.92 on all evaluation indicators, with an average improvement of 23-28.3% compared to the worst-performing ENN method. This indicates that our method has a significant advantage in handling data imbalance, being able to more accurately identify collision samples and reduce the misclassification rate of non-collision samples.

摘要

数据不平衡是影响碰撞风险评估预测准确性的关键因素。本研究提出了一种基于委员会查询(QBC)和辅助分类器生成对抗网络(ACGAN)的先进主动生成过采样方法,并与瓦瑟斯坦生成对抗网络(WGAN)框架相结合。我们的方法通过QBC和多样性度量有选择地丰富少数类样本,以增强样本生成的多样性,从而提高故障分类算法的性能。通过将所选样本的标签与真实样本的标签等同起来,我们提高了判别器的准确性,迫使生成器产生更多样化的输出,这有望改善分类结果。我们还提出了一种基于损失差异动态调整生成器和判别器训练轮次的方法,以实现模型训练的平衡。对四个公开可用的不平衡数据集的实证分析表明,我们的方法在精度、召回率、F值和G均值方面优于现有方法。具体而言,我们的方法在所有评估指标上的结果均高于0.92,与表现最差的ENN方法相比,平均提高了23 - 28.3%。这表明我们的方法在处理数据不平衡方面具有显著优势,能够更准确地识别碰撞样本并降低非碰撞样本的误分类率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/08dd/11914271/1cb0955fd1fb/41598_2025_93851_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验