• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

训练用于医学决策的神经网络分类器:不均衡数据集对分类性能的影响。

Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance.

作者信息

Mazurowski Maciej A, Habas Piotr A, Zurada Jacek M, Lo Joseph Y, Baker Jay A, Tourassi Georgia D

机构信息

Computational Intelligence Lab, Department of Electrical and Computer Engineering, University of Louisville, Louisville, KY 40292, USA.

出版信息

Neural Netw. 2008 Mar-Apr;21(2-3):427-36. doi: 10.1016/j.neunet.2007.12.031. Epub 2007 Dec 27.

DOI:10.1016/j.neunet.2007.12.031
PMID:18272329
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2346433/
Abstract

This study investigates the effect of class imbalance in training data when developing neural network classifiers for computer-aided medical diagnosis. The investigation is performed in the presence of other characteristics that are typical among medical data, namely small training sample size, large number of features, and correlations between features. Two methods of neural network training are explored: classical backpropagation (BP) and particle swarm optimization (PSO) with clinically relevant training criteria. An experimental study is performed using simulated data and the conclusions are further validated on real clinical data for breast cancer diagnosis. The results show that classifier performance deteriorates with even modest class imbalance in the training data. Further, it is shown that BP is generally preferable over PSO for imbalanced training data especially with small data sample and large number of features. Finally, it is shown that there is no clear preference between oversampling and no compensation approach and some guidance is provided regarding a proper selection.

摘要

本研究探讨了在开发用于计算机辅助医学诊断的神经网络分类器时,训练数据中类别不平衡的影响。该研究是在存在医学数据中典型的其他特征的情况下进行的,即训练样本量小、特征数量多以及特征之间的相关性。探索了两种神经网络训练方法:经典反向传播(BP)和具有临床相关训练标准的粒子群优化(PSO)。使用模拟数据进行了一项实验研究,并在用于乳腺癌诊断的真实临床数据上进一步验证了结论。结果表明,即使训练数据中存在适度的类别不平衡,分类器性能也会下降。此外,结果表明,对于不平衡的训练数据,尤其是数据样本量小且特征数量多的情况,BP通常比PSO更可取。最后,结果表明,过采样和无补偿方法之间没有明显的偏好,并提供了关于正确选择的一些指导。

相似文献

1
Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance.训练用于医学决策的神经网络分类器:不均衡数据集对分类性能的影响。
Neural Netw. 2008 Mar-Apr;21(2-3):427-36. doi: 10.1016/j.neunet.2007.12.031. Epub 2007 Dec 27.
2
Neural network classifier with entropy based feature selection on breast cancer diagnosis.基于熵的特征选择的神经网络分类器在乳腺癌诊断中的应用。
J Med Syst. 2010 Oct;34(5):865-73. doi: 10.1007/s10916-009-9301-x. Epub 2009 May 5.
3
Classifier design for computer-aided diagnosis: effects of finite sample size on the mean performance of classical and neural network classifiers.用于计算机辅助诊断的分类器设计:有限样本量对经典分类器和神经网络分类器平均性能的影响。
Med Phys. 1999 Dec;26(12):2654-68. doi: 10.1118/1.598805.
4
Correlation-Based Ensemble Feature Selection Using Bioinspired Algorithms and Classification Using Backpropagation Neural Network.基于生物启发算法的相关性集成特征选择和反向传播神经网络分类。
Comput Math Methods Med. 2019 Sep 23;2019:7398307. doi: 10.1155/2019/7398307. eCollection 2019.
5
Combining a gravitational search algorithm, particle swarm optimization, and fuzzy rules to improve the classification performance of a feed-forward neural network.结合引力搜索算法、粒子群优化和模糊规则来提高前馈神经网络的分类性能。
Comput Methods Programs Biomed. 2019 Oct;180:105016. doi: 10.1016/j.cmpb.2019.105016. Epub 2019 Aug 8.
6
Optimized approach to decision fusion of heterogeneous data for breast cancer diagnosis.用于乳腺癌诊断的异构数据决策融合优化方法。
Med Phys. 2006 Aug;33(8):2945-54. doi: 10.1118/1.2208934.
7
A pruned ensemble classifier for effective breast thermogram analysis.一种用于有效乳腺热成像分析的剪枝集成分类器。
Annu Int Conf IEEE Eng Med Biol Soc. 2013;2013:7120-3. doi: 10.1109/EMBC.2013.6611199.
8
Feature Selection and Classification of Clinical Datasets Using Bioinspired Algorithms and Super Learner.基于生物启发算法和超级学习者的临床数据集特征选择与分类。
Comput Math Methods Med. 2021 May 17;2021:6662420. doi: 10.1155/2021/6662420. eCollection 2021.
9
Usage of case-based reasoning, neural network and adaptive neuro-fuzzy inference system classification techniques in breast cancer dataset classification diagnosis.基于案例推理、神经网络和自适应神经模糊推理系统分类技术在乳腺癌数据集分类诊断中的应用。
J Med Syst. 2012 Apr;36(2):407-14. doi: 10.1007/s10916-010-9485-0. Epub 2010 May 2.
10
Classification of electrocardiogram signals with support vector machines and particle swarm optimization.基于支持向量机和粒子群优化的心电图信号分类
IEEE Trans Inf Technol Biomed. 2008 Sep;12(5):667-77. doi: 10.1109/TITB.2008.923147.

引用本文的文献

1
Diffuse reflectance and fluorescence spectroscopy for breast conserving surgery.用于保乳手术的漫反射和荧光光谱学
Breast Cancer Res Treat. 2025 Nov;214(1):25-36. doi: 10.1007/s10549-025-07790-8. Epub 2025 Aug 1.
2
Smart adaptive ensemble model for multiclass imbalanced nonstationary data streams.用于多类不平衡非平稳数据流的智能自适应集成模型。
Sci Rep. 2025 Jul 1;15(1):21140. doi: 10.1038/s41598-025-05122-w.
3
A modified generative adversarial networks method for assisting the diagnosis of deep venous thrombosis complications in stroke patients.一种用于辅助诊断中风患者深静脉血栓形成并发症的改进型生成对抗网络方法。
Sci Rep. 2025 Jul 1;15(1):22372. doi: 10.1038/s41598-025-04880-x.
4
Explainable machine learning models for mortality prediction in patients with sepsis in tertiary care hospital ICU in low- to middle-income countries.低收入和中等收入国家三级护理医院重症监护病房中用于脓毒症患者死亡率预测的可解释机器学习模型。
Intensive Care Med Exp. 2025 Jun 3;13(1):56. doi: 10.1186/s40635-025-00765-5.
5
Imbalanced feature generation based on bootstrap power spectral curve for estimating respiratory rate.基于自举功率谱曲线的不平衡特征生成用于估计呼吸频率。
Sci Rep. 2025 May 21;15(1):17668. doi: 10.1038/s41598-025-02270-x.
6
An oversampling-undersampling strategy for large-scale data linkage.一种用于大规模数据链接的过采样-欠采样策略。
Front Big Data. 2025 Apr 23;8:1542483. doi: 10.3389/fdata.2025.1542483. eCollection 2025.
7
Multivariate prediction of temper outbursts in a sample of youth enriched for irritability using ecological momentary assessment data: A registered report.利用生态瞬时评估数据对易怒特质丰富的青少年样本中的发脾气行为进行多变量预测:一项注册报告。
PLoS One. 2025 Mar 18;20(3):e0289235. doi: 10.1371/journal.pone.0289235. eCollection 2025.
8
Imbalanced Power Spectral Generation for Respiratory Rate and Uncertainty Estimations Based on Photoplethysmography Signal.基于光电容积脉搏波信号的呼吸率和不确定性估计的不平衡功率谱生成
Sensors (Basel). 2025 Feb 26;25(5):1437. doi: 10.3390/s25051437.
9
Artificial Intelligence in the Non-Invasive Detection of Melanoma.人工智能在黑色素瘤的非侵入性检测中的应用
Life (Basel). 2024 Dec 4;14(12):1602. doi: 10.3390/life14121602.
10
A Systematic Review of Recent Studies on Hospital Readmissions of Patients With Diabetes.糖尿病患者医院再入院近期研究的系统评价
Cureus. 2024 Aug 22;16(8):e67513. doi: 10.7759/cureus.67513. eCollection 2024 Aug.

本文引用的文献

1
On dimensionality, sample size, classification error, and complexity of classification algorithm in pattern recognition.在模式识别中的维度、样本大小、分类错误和分类算法的复杂性。
IEEE Trans Pattern Anal Mach Intell. 1980 Mar;2(3):242-52. doi: 10.1109/tpami.1980.4767011.
2
Backpropagation uses prior information efficiently.反向传播有效地利用了先验信息。
IEEE Trans Neural Netw. 1993;4(5):794-802. doi: 10.1109/72.248457.
3
Breast mass lesions: computer-aided diagnosis models with mammographic and sonographic descriptors.乳腺肿块病变:具有乳腺X线摄影和超声描述符的计算机辅助诊断模型
Radiology. 2007 Aug;244(2):390-8. doi: 10.1148/radiol.2442060712. Epub 2007 Jun 11.
4
Comparison of typical evaluation methods for computer-aided diagnostic schemes: Monte Carlo simulation study.计算机辅助诊断方案典型评估方法的比较:蒙特卡洛模拟研究
Med Phys. 2007 Mar;34(3):871-6. doi: 10.1118/1.2437130.
5
Feature subset selection for improving the performance of false positive reduction in lung nodule CAD.用于提高肺结节计算机辅助检测中减少假阳性性能的特征子集选择
IEEE Trans Inf Technol Biomed. 2006 Jul;10(3):504-11. doi: 10.1109/titb.2006.872063.
6
Diagnostic imaging over the last 50 years: research and development in medical imaging science and technology.过去50年的诊断成像:医学成像科学与技术的研究与发展
Phys Med Biol. 2006 Jul 7;51(13):R5-27. doi: 10.1088/0031-9155/51/13/R02. Epub 2006 Jun 20.
7
Reduction of bias and variance for evaluation of computer-aided diagnostic schemes.减少用于评估计算机辅助诊断方案的偏差和方差
Med Phys. 2006 Apr;33(4):868-75. doi: 10.1118/1.2179750.
8
The use of artificial neural networks in decision support in cancer: a systematic review.人工神经网络在癌症决策支持中的应用:一项系统综述。
Neural Netw. 2006 May;19(4):408-15. doi: 10.1016/j.neunet.2005.10.007. Epub 2006 Feb 14.
9
Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success.使用临床决策支持系统改善临床实践:对确定成功关键特征的试验进行系统评价。
BMJ. 2005 Apr 2;330(7494):765. doi: 10.1136/bmj.38398.500764.8F. Epub 2005 Mar 14.
10
On the repeated use of databases for testing incremental improvement of computer-aided detection schemes.关于重复使用数据库以测试计算机辅助检测方案的渐进式改进
Acad Radiol. 2004 Jan;11(1):103-5. doi: 10.1016/s1076-6332(03)00511-7.