• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于生成对抗网络的生命统计数据合成少数族裔过采样。

Synthetic minority oversampling of vital statistics data with generative adversarial networks.

机构信息

Department of Future Technologies, University of Turku, Turku, Finland.

PerkinElmer, Turku, Finland.

出版信息

J Am Med Inform Assoc. 2020 Nov 1;27(11):1667-1674. doi: 10.1093/jamia/ocaa127.

DOI:10.1093/jamia/ocaa127
PMID:32885818
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7750982/
Abstract

OBJECTIVE

Minority oversampling is a standard approach used for adjusting the ratio between the classes on imbalanced data. However, established methods often provide modest improvements in classification performance when applied to data with extremely imbalanced class distribution and to mixed-type data. This is usual for vital statistics data, in which the outcome incidence dictates the amount of positive observations. In this article, we developed a novel neural network-based oversampling method called actGAN (activation-specific generative adversarial network) that can derive useful synthetic observations in terms of increasing prediction performance in this context.

MATERIALS AND METHODS

From vital statistics data, the outcome of early stillbirth was chosen to be predicted based on demographics, pregnancy history, and infections. The data contained 363 560 live births and 139 early stillbirths, resulting in class imbalance of 99.96% and 0.04%. The hyperparameters of actGAN and a baseline method SMOTE-NC (Synthetic Minority Over-sampling Technique-Nominal Continuous) were tuned with Bayesian optimization, and both were compared against a cost-sensitive learning-only approach.

RESULTS

While SMOTE-NC provided mixed results, actGAN was able to improve true positive rate at a clinically significant false positive rate and area under the curve from the receiver-operating characteristic curve consistently.

DISCUSSION

Including an activation-specific output layer to a generator network of actGAN enables the addition of information about the underlying data structure, which overperforms the nominal mechanism of SMOTE-NC.

CONCLUSIONS

actGAN provides an improvement to the prediction performance for our learning task. Our developed method could be applied to other mixed-type data prediction tasks that are known to be afflicted by class imbalance and limited data availability.

摘要

目的

少数过采样是一种用于调整不平衡数据中类之间比例的标准方法。然而,当应用于具有极不平衡的类分布和混合类型数据的数据时,已建立的方法通常在分类性能方面提供适度的改进。这对于生命统计数据来说很常见,其中结果发生率决定了阳性观察值的数量。在本文中,我们开发了一种新的基于神经网络的过采样方法,称为 actGAN(激活特定生成对抗网络),它可以根据在这种情况下提高预测性能的角度来衍生有用的合成观察值。

材料和方法

从生命统计数据中,选择根据人口统计学、妊娠史和感染来预测早期死产的结果。该数据包含 363560 例活产和 139 例早期死产,导致类不平衡为 99.96%和 0.04%。actGAN 和基线方法 SMOTE-NC(合成少数过采样技术-名义连续)的超参数使用贝叶斯优化进行调整,并与仅基于成本的学习方法进行比较。

结果

虽然 SMOTE-NC 提供了混合结果,但 actGAN 能够在临床上有意义的假阳性率和接收者操作特征曲线下面积(ROC 曲线)一致的情况下提高真阳性率。

讨论

在 actGAN 的生成器网络中添加一个激活特定的输出层,使生成器能够添加有关底层数据结构的信息,从而在性能上优于 SMOTE-NC 的名义机制。

结论

actGAN 提高了我们学习任务的预测性能。我们开发的方法可以应用于其他已知受到类不平衡和有限数据可用性影响的混合类型数据预测任务。

相似文献

1
Synthetic minority oversampling of vital statistics data with generative adversarial networks.基于生成对抗网络的生命统计数据合成少数族裔过采样。
J Am Med Inform Assoc. 2020 Nov 1;27(11):1667-1674. doi: 10.1093/jamia/ocaa127.
2
A new imbalanced data oversampling method based on Bootstrap method and Wasserstein Generative Adversarial Network.一种基于自助法和瓦瑟斯坦生成对抗网络的新型不平衡数据过采样方法。
Math Biosci Eng. 2024 Feb 26;21(3):4309-4327. doi: 10.3934/mbe.2024190.
3
Immune centroids oversampling method for binary classification.用于二分类的免疫质心过采样方法。
Comput Intell Neurosci. 2015;2015:109806. doi: 10.1155/2015/109806. Epub 2015 Mar 5.
4
DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data.深度SMOTE:融合深度学习与SMOTE处理不均衡数据
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):6390-6404. doi: 10.1109/TNNLS.2021.3136503. Epub 2023 Sep 1.
5
An autonomous mixed data oversampling method for AIOT-based churn recognition and personalized recommendations using behavioral segmentation.一种基于行为细分的用于基于人工智能物联网的客户流失识别和个性化推荐的自主混合数据过采样方法。
PeerJ Comput Sci. 2024 Jan 2;10:e1756. doi: 10.7717/peerj-cs.1756. eCollection 2024.
6
Crash injury severity prediction considering data imbalance: A Wasserstein generative adversarial network with gradient penalty approach.考虑数据不平衡的碰撞损伤严重程度预测:带梯度惩罚的 Wasserstein 生成对抗网络方法。
Accid Anal Prev. 2023 Nov;192:107271. doi: 10.1016/j.aap.2023.107271. Epub 2023 Aug 31.
7
CEGAN: Classification Enhancement Generative Adversarial Networks for unraveling data imbalance problems.CEGAN:分类增强生成对抗网络,用于解决数据不平衡问题。
Neural Netw. 2021 Jan;133:69-86. doi: 10.1016/j.neunet.2020.10.004. Epub 2020 Oct 17.
8
A novel method for detecting credit card fraud problems.一种用于检测信用卡欺诈问题的新方法。
PLoS One. 2024 Mar 6;19(3):e0294537. doi: 10.1371/journal.pone.0294537. eCollection 2024.
9
Enhancing and improving the performance of imbalanced class data using novel GBO and SSG: A comparative analysis.利用新型 GBO 和 SSG 增强和改进不平衡类数据的性能:比较分析。
Neural Netw. 2024 May;173:106157. doi: 10.1016/j.neunet.2024.106157. Epub 2024 Feb 2.
10
SMOTE-CD: SMOTE for compositional data.SMOTE-CD:针对组合数据的 SMOTE 方法。
PLoS One. 2023 Jun 29;18(6):e0287705. doi: 10.1371/journal.pone.0287705. eCollection 2023.

引用本文的文献

1
An explainable machine learning model for predicting the risk of distant metastasis in intrahepatic cholangiocarcinoma: a population-based cohort study.一种用于预测肝内胆管癌远处转移风险的可解释机器学习模型:一项基于人群的队列研究。
Discov Oncol. 2025 Jun 18;16(1):1140. doi: 10.1007/s12672-025-02952-y.
2
Predicting the risk of ibrutinib in combination with R-ICE in patients with relapsed or refractory DLBCL using explainable machine learning algorithms.使用可解释的机器学习算法预测依鲁替尼联合R-ICE方案治疗复发或难治性弥漫性大B细胞淋巴瘤(DLBCL)患者的风险。
Clin Exp Med. 2025 May 26;25(1):177. doi: 10.1007/s10238-025-01709-9.
3

本文引用的文献

1
Predicting risk of stillbirth and preterm pregnancies with machine learning.利用机器学习预测死产和早产风险。
Health Inf Sci Syst. 2020 Mar 25;8(1):14. doi: 10.1007/s13755-020-00105-9. eCollection 2020 Dec.
2
Generative adversarial network in medical imaging: A review.生成对抗网络在医学影像中的应用:综述
Med Image Anal. 2019 Dec;58:101552. doi: 10.1016/j.media.2019.101552. Epub 2019 Aug 31.
3
A stillbirth calculator: Development and internal validation of a clinical prediction model to quantify stillbirth risk.
Predicting liver metastasis in pancreatic neuroendocrine tumors with an interpretable machine learning algorithm: a SEER-based study.
使用可解释机器学习算法预测胰腺神经内分泌肿瘤中的肝转移:一项基于监测、流行病学和最终结果(SEER)数据库的研究
Front Med (Lausanne). 2025 May 1;12:1533132. doi: 10.3389/fmed.2025.1533132. eCollection 2025.
4
Development and validation of machine learning models for predicting lung metastasis risk in differentiated thyroid cancer based on two databases.基于两个数据库的预测分化型甲状腺癌肺转移风险的机器学习模型的开发与验证
Gland Surg. 2024 Nov 30;13(11):2174-2188. doi: 10.21037/gs-24-481. Epub 2024 Nov 26.
5
Large language models and synthetic health data: progress and prospects.大语言模型与合成健康数据:进展与前景
JAMIA Open. 2024 Oct 26;7(4):ooae114. doi: 10.1093/jamiaopen/ooae114. eCollection 2024 Dec.
6
MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation.MedDiffusion:通过基于扩散的数据增强提升健康风险预测
Proc SIAM Int Conf Data Min. 2024;2024:499-507. doi: 10.1137/1.9781611978032.58.
7
An integrative clinical and CT-based tumoral/peritumoral radiomics nomogram to predict the microsatellite instability in rectal carcinoma.基于临床和 CT 的肿瘤/肿瘤周围放射组学列线图预测直肠癌的微卫星不稳定性。
Abdom Radiol (NY). 2024 Mar;49(3):783-790. doi: 10.1007/s00261-023-04099-2. Epub 2023 Nov 24.
8
The value of standards for health datasets in artificial intelligence-based applications.基于人工智能应用的健康数据集标准的价值。
Nat Med. 2023 Nov;29(11):2929-2938. doi: 10.1038/s41591-023-02608-w. Epub 2023 Oct 26.
9
Applying machine learning techniques to predict the risk of lung metastases from rectal cancer: a real-world retrospective study.应用机器学习技术预测直肠癌肺转移风险:一项真实世界回顾性研究。
Front Oncol. 2023 May 24;13:1183072. doi: 10.3389/fonc.2023.1183072. eCollection 2023.
10
Implications of resampling data to address the class imbalance problem (IRCIP): an evaluation of impact on performance between classification algorithms in medical data.重采样数据以解决类别不平衡问题的影响(IRCIP):医学数据中分类算法间性能影响的评估
JAMIA Open. 2023 May 31;6(2):ooad033. doi: 10.1093/jamiaopen/ooad033. eCollection 2023 Jul.
死产计算器:用于量化死产风险的临床预测模型的开发与内部验证
PLoS One. 2017 Mar 7;12(3):e0173461. doi: 10.1371/journal.pone.0173461. eCollection 2017.
4
Predicting stillbirth in a low resource setting.在资源匮乏地区预测死产
BMC Pregnancy Childbirth. 2016 Sep 20;16:274. doi: 10.1186/s12884-016-1061-2.
5
Prediction of stillbirth from maternal demographic and pregnancy characteristics.从产妇人口统计学和妊娠特征预测死胎。
Ultrasound Obstet Gynecol. 2016 Nov;48(5):607-612. doi: 10.1002/uog.17290. Epub 2016 Oct 5.
6
Prediction of stillbirth from biochemical and biophysical markers at 11-13 weeks.11-13 孕周的生化和生物物理标志物预测死胎。
Ultrasound Obstet Gynecol. 2016 Nov;48(5):613-617. doi: 10.1002/uog.17289.
7
Statistics corner: A guide to appropriate use of correlation coefficient in medical research.统计专栏:医学研究中相关系数合理应用指南
Malawi Med J. 2012 Sep;24(3):69-71.
8
Maternal and fetal risk factors for stillbirth: population based study.母体和胎儿因素与死胎的关系:基于人群的研究。
BMJ. 2013 Jan 24;346:f108. doi: 10.1136/bmj.f108.
9
Major risk factors for stillbirth in high-income countries: a systematic review and meta-analysis.高收入国家中导致死产的主要风险因素:系统评价和荟萃分析。
Lancet. 2011 Apr 16;377(9774):1331-40. doi: 10.1016/S0140-6736(10)62233-7.
10
Prediction of miscarriage and stillbirth at 11-13 weeks and the contribution of chorionic villus sampling.11-13 孕周时流产和死胎的预测及绒毛取样的作用。
Prenat Diagn. 2011 Jan;31(1):38-45. doi: 10.1002/pd.2644.