• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于插补的机器学习框架增强早期妊娠糖尿病预测:对真实世界临床记录的比较研究

Enhancing early gestational diabetes mellitus prediction with imputation-based machine learning framework: A comparative study on real-world clinical records.

作者信息

Ma Leyao, Yang Lin, Wang Yaxin, Hao Jie, Li Yini, Ma Liangkun, Wang Ziyang, Li Ye, Zhang Suhan, Hu Mingyue, Li Jiao, Sun Yin

机构信息

Institute of Medical Information, Chinese Academy of Medical Science & Peking Union Medical College, Beijing, China.

Key Laboratory of Medical Information Intelligent Technology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.

出版信息

Digit Health. 2025 Jul 29;11:20552076251352436. doi: 10.1177/20552076251352436. eCollection 2025 Jan-Dec.

DOI:10.1177/20552076251352436
PMID:40755962
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12317186/
Abstract

OBJECTIVE

Gestational diabetes mellitus (GDM) is one of the most common pregnancy complications. Electronic health records (EHRs) promise GDM risk prediction, but missing data poses a challenge to developing reliable and generalizable risk prediction models. This study aims to address the problem of missing EHR data in GDM prediction before 12 weeks gestation.

METHODS

A total of 5066 women with singleton pregnancies, aged 18 to 50, were included in this retrospective study. This study evaluated 6 imputation methods, combined with 4 classification machine learning models. The evaluation encompassed downstream predictive performance, robustness to variable missingness, ability to restore original data distribution, and influence on feature selection based on 10-fold cross-validation.

RESULTS

Our findings revealed a significant improvement in model performance with imputation. When using the top 30 features, logistic regression (LR) with multivariate imputation by chained equations using classification and regression trees (mice) achieved the highest area under the receiver operating characteristic curve of 0.6899, compared to 0.6336 for the LR model without imputation. Mice also led to the best average performance across prediction models and yielded the most accurate restoration of the original data distribution. LR models trained on data imputed by mice remained the most robust across varying levels of missingness. The classification algorithm primarily accounted for differences in predictive performance. In addition, we identified 18 key features for early GDM prediction in the Chinese population.

CONCLUSION

This study demonstrates the critical role of imputation in improving the performance and fairness of GDM prediction models. The findings provide practical guidance for integrating imputation into clinical machine learning pipelines.

摘要

目的

妊娠期糖尿病(GDM)是最常见的妊娠并发症之一。电子健康记录(EHR)有望实现GDM风险预测,但数据缺失对开发可靠且可推广的风险预测模型构成挑战。本研究旨在解决妊娠12周前GDM预测中EHR数据缺失的问题。

方法

本回顾性研究共纳入5066名单胎妊娠、年龄在18至50岁之间的女性。本研究评估了6种插补方法,并结合4种分类机器学习模型。评估包括下游预测性能、对变量缺失的稳健性、恢复原始数据分布的能力以及基于10折交叉验证对特征选择的影响。

结果

我们的研究结果显示,插补后模型性能有显著提升。使用前30个特征时,采用分类与回归树的链式方程多元插补法(mice)的逻辑回归(LR)模型在受试者工作特征曲线下的面积最高,为0.6899,而未进行插补的LR模型为0.6336。Mice方法在各预测模型中也带来了最佳平均性能,并能最准确地恢复原始数据分布。在不同缺失水平下,基于mice插补数据训练的LR模型仍然是最稳健的。分类算法是预测性能差异的主要原因。此外,我们还确定了中国人群早期GDM预测的18个关键特征。

结论

本研究证明了插补在提高GDM预测模型性能和公平性方面的关键作用。研究结果为将插补方法整合到临床机器学习流程中提供了实用指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe64/12317186/b4637c8a3d83/10.1177_20552076251352436-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe64/12317186/aa6f3cf04777/10.1177_20552076251352436-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe64/12317186/e2910b996998/10.1177_20552076251352436-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe64/12317186/9b7b5560620e/10.1177_20552076251352436-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe64/12317186/2787a8fae8af/10.1177_20552076251352436-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe64/12317186/b4637c8a3d83/10.1177_20552076251352436-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe64/12317186/aa6f3cf04777/10.1177_20552076251352436-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe64/12317186/e2910b996998/10.1177_20552076251352436-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe64/12317186/9b7b5560620e/10.1177_20552076251352436-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe64/12317186/2787a8fae8af/10.1177_20552076251352436-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe64/12317186/b4637c8a3d83/10.1177_20552076251352436-fig5.jpg

相似文献

1
Enhancing early gestational diabetes mellitus prediction with imputation-based machine learning framework: A comparative study on real-world clinical records.基于插补的机器学习框架增强早期妊娠糖尿病预测:对真实世界临床记录的比较研究
Digit Health. 2025 Jul 29;11:20552076251352436. doi: 10.1177/20552076251352436. eCollection 2025 Jan-Dec.
2
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能?开发一种互联网应用算法。
Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.
3
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
4
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
5
A Responsible Framework for Assessing, Selecting, and Explaining Machine Learning Models in Cardiovascular Disease Outcomes Among People With Type 2 Diabetes: Methodology and Validation Study.用于评估、选择和解释2型糖尿病患者心血管疾病结局机器学习模型的责任框架:方法与验证研究
JMIR Med Inform. 2025 Jun 27;13:e66200. doi: 10.2196/66200.
6
Generative adversarial networks for imputing missing data for big data clinical research.生成对抗网络在大数据临床研究中用于填补缺失数据。
BMC Med Res Methodol. 2021 Apr 20;21(1):78. doi: 10.1186/s12874-021-01272-3.
7
Predictive modeling of complications arising from early-onset preeclampsia in pregnant women.早发型子痫前期孕妇并发症的预测模型
Womens Health (Lond). 2025 Jan-Dec;21:17455057251348978. doi: 10.1177/17455057251348978. Epub 2025 Jul 21.
8
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
9
Deciphering Shared Gene Signatures and Immune Infiltration Characteristics Between Gestational Diabetes Mellitus and Preeclampsia by Integrated Bioinformatics Analysis and Machine Learning.通过综合生物信息学分析和机器学习破译妊娠期糖尿病和子痫前期之间共享的基因特征及免疫浸润特征
Reprod Sci. 2025 May 15. doi: 10.1007/s43032-025-01847-1.
10
Assessing and validating machine learning-enhanced imputation of admission American Spinal Injury Association Impairment Scale grades for spinal cord injury.评估并验证机器学习增强的脊髓损伤患者入院时美国脊髓损伤协会损伤分级量表评分的插补法
J Neurosurg Spine. 2025 May 9;43(1):90-97. doi: 10.3171/2025.1.SPINE241135. Print 2025 Jul 1.

本文引用的文献

1
Machine learning based model for the early detection of Gestational Diabetes Mellitus.基于机器学习的妊娠期糖尿病早期检测模型。
BMC Med Inform Decis Mak. 2025 Mar 13;25(1):130. doi: 10.1186/s12911-025-02947-3.
2
Validating Multicenter Cohort Circular RNA Model for Early Screening and Diagnosis of Gestational Diabetes Mellitus.验证用于妊娠期糖尿病早期筛查和诊断的多中心队列环状RNA模型
Diabetes Metab J. 2025 May;49(3):462-474. doi: 10.4093/dmj.2024.0205. Epub 2025 Feb 21.
3
Accurate predictions on small data with a tabular foundation model.
基于表格基础模型对小数据进行准确预测。
Nature. 2025 Jan;637(8045):319-326. doi: 10.1038/s41586-024-08328-6. Epub 2025 Jan 8.
4
The limits of fair medical imaging AI in real-world generalization.公平的医学影像 AI 在现实世界泛化中的局限性。
Nat Med. 2024 Oct;30(10):2838-2848. doi: 10.1038/s41591-024-03113-4. Epub 2024 Jun 28.
5
Early pregnancy HbA as the first screening test for gestational diabetes: results from three prospective cohorts.早孕期糖化血红蛋白作为妊娠期糖尿病的首次筛查试验:三项前瞻性队列研究的结果。
Lancet Diabetes Endocrinol. 2024 Aug;12(8):535-544. doi: 10.1016/S2213-8587(24)00151-7. Epub 2024 Jun 24.
6
Meta-EHR: A meta-learning approach for electronic health records with a high imbalanced ratio and missing rate.元电子健康记录:一种针对高不平衡比和缺失率的电子健康记录的元学习方法。
Annu Int Conf IEEE Eng Med Biol Soc. 2023 Jul;2023:1-4. doi: 10.1109/EMBC40787.2023.10340634.
7
Receiver operating characteristic curve analysis in diagnostic accuracy studies: A guide to interpreting the area under the curve value.诊断准确性研究中的受试者工作特征曲线分析:曲线下面积值解读指南。
Turk J Emerg Med. 2023 Oct 3;23(4):195-198. doi: 10.4103/tjem.tjem_182_23. eCollection 2023 Oct-Dec.
8
A Simplified Screening Model to Predict the Risk of Gestational Diabetes Mellitus in Pregnant Chinese Women.一种预测中国孕妇妊娠期糖尿病风险的简化筛查模型。
Diabetes Ther. 2023 Dec;14(12):2143-2157. doi: 10.1007/s13300-023-01480-8. Epub 2023 Oct 16.
9
The impact of imputation quality on machine learning classifiers for datasets with missing values.插补质量对具有缺失值数据集的机器学习分类器的影响。
Commun Med (Lond). 2023 Oct 6;3(1):139. doi: 10.1038/s43856-023-00356-z.
10
Leakage and the reproducibility crisis in machine-learning-based science.基于机器学习的科学中的漏洞与可重复性危机。
Patterns (N Y). 2023 Aug 4;4(9):100804. doi: 10.1016/j.patter.2023.100804. eCollection 2023 Sep 8.