• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

处理早发性近视风险预测模型中的缺失数据和测量误差。

Handling missing data and measurement error for early-onset myopia risk prediction models.

机构信息

School of Data Science, Fudan University, Shanghai, China.

School of Economics and Management, Beijing Forestry University, Beijing, China.

出版信息

BMC Med Res Methodol. 2024 Sep 6;24(1):194. doi: 10.1186/s12874-024-02319-x.

DOI:10.1186/s12874-024-02319-x
PMID:39243025
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11378546/
Abstract

BACKGROUND

Early identification of children at high risk of developing myopia is essential to prevent myopia progression by introducing timely interventions. However, missing data and measurement error (ME) are common challenges in risk prediction modelling that can introduce bias in myopia prediction.

METHODS

We explore four imputation methods to address missing data and ME: single imputation (SI), multiple imputation under missing at random (MI-MAR), multiple imputation with calibration procedure (MI-ME), and multiple imputation under missing not at random (MI-MNAR). We compare four machine-learning models (Decision Tree, Naive Bayes, Random Forest, and Xgboost) and three statistical models (logistic regression, stepwise logistic regression, and least absolute shrinkage and selection operator logistic regression) in myopia risk prediction. We apply these models to the Shanghai Jinshan Myopia Cohort Study and also conduct a simulation study to investigate the impact of missing mechanisms, the degree of ME, and the importance of predictors on model performance. Model performance is evaluated using the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC).

RESULTS

Our findings indicate that in scenarios with missing data and ME, using MI-ME in combination with logistic regression yields the best prediction results. In scenarios without ME, employing MI-MAR to handle missing data outperforms SI regardless of the missing mechanisms. When ME has a greater impact on prediction than missing data, the relative advantage of MI-MAR diminishes, and MI-ME becomes more superior. Furthermore, our results demonstrate that statistical models exhibit better prediction performance than machine-learning models.

CONCLUSION

MI-ME emerges as a reliable method for handling missing data and ME in important predictors for early-onset myopia risk prediction.

摘要

背景

早期识别有发展为近视风险的儿童对于通过及时干预来阻止近视进展至关重要。然而,缺失数据和测量误差(ME)是风险预测建模中常见的挑战,可能会导致近视预测的偏差。

方法

我们探索了四种处理缺失数据和 ME 的插补方法:单一插补(SI)、在随机缺失下的多重插补(MI-MAR)、带有校准程序的多重插补(MI-ME)和在非随机缺失下的多重插补(MI-MNAR)。我们比较了四种机器学习模型(决策树、朴素贝叶斯、随机森林和 Xgboost)和三种统计模型(逻辑回归、逐步逻辑回归和最小绝对收缩和选择算子逻辑回归)在近视风险预测中的应用。我们将这些模型应用于上海金山近视队列研究,并进行了一项模拟研究,以调查缺失机制、ME 的程度以及预测因子的重要性对模型性能的影响。模型性能通过接收者操作特征曲线(AUROC)和精度-召回曲线下面积(AUPRC)进行评估。

结果

我们的研究结果表明,在存在缺失数据和 ME 的情况下,使用 MI-ME 结合逻辑回归可获得最佳的预测结果。在不存在 ME 的情况下,使用 MI-MAR 处理缺失数据优于 SI,无论缺失机制如何。当 ME 对预测的影响大于缺失数据时,MI-MAR 的相对优势会减弱,而 MI-ME 则更具优势。此外,我们的结果表明,统计模型在预测性能方面优于机器学习模型。

结论

MI-ME 是一种可靠的方法,可用于处理早期近视风险预测中重要预测因子的缺失数据和 ME。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8f1/11378546/29069af6cc1f/12874_2024_2319_Figd_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8f1/11378546/fb8572fe5237/12874_2024_2319_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8f1/11378546/bd0198f1b623/12874_2024_2319_Figc_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8f1/11378546/29069af6cc1f/12874_2024_2319_Figd_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8f1/11378546/fb8572fe5237/12874_2024_2319_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8f1/11378546/bd0198f1b623/12874_2024_2319_Figc_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8f1/11378546/29069af6cc1f/12874_2024_2319_Figd_HTML.jpg

相似文献

1
Handling missing data and measurement error for early-onset myopia risk prediction models.处理早发性近视风险预测模型中的缺失数据和测量误差。
BMC Med Res Methodol. 2024 Sep 6;24(1):194. doi: 10.1186/s12874-024-02319-x.
2
Approaches for missing covariate data in logistic regression with MNAR sensitivity analyses.具有 MAR 敏感性分析的逻辑回归中缺失协变量数据的处理方法。
Biom J. 2020 Jul;62(4):1025-1037. doi: 10.1002/bimj.201900117. Epub 2020 Jan 20.
3
Prediction of myopia development among Chinese school-aged children using refraction data from electronic medical records: A retrospective, multicentre machine learning study.基于电子病历中的屈光数据预测中国学龄儿童近视进展:一项回顾性、多中心机器学习研究。
PLoS Med. 2018 Nov 6;15(11):e1002674. doi: 10.1371/journal.pmed.1002674. eCollection 2018 Nov.
4
Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study.预后建模研究中缺失协变量数据处理技术的比较:一项模拟研究。
BMC Med Res Methodol. 2010 Jan 19;10:7. doi: 10.1186/1471-2288-10-7.
5
Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets.缺失数据插补方法对队列研究数据集预测建模效果的比较。
BMC Med Res Methodol. 2024 Feb 16;24(1):41. doi: 10.1186/s12874-024-02173-x.
6
Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation.通过结合内部验证和多重填补来评估不完整数据中的预测性能。
BMC Med Res Methodol. 2016 Oct 26;16(1):144. doi: 10.1186/s12874-016-0239-7.
7
Machine Learning Models for Predicting Cycloplegic Refractive Error and Myopia Status Based on Non-Cycloplegic Data in Chinese Students.基于中国学生非睫状肌麻痹数据的预测睫状肌麻痹屈光误差和近视状态的机器学习模型。
Transl Vis Sci Technol. 2024 Aug 1;13(8):16. doi: 10.1167/tvst.13.8.16.
8
Prediction of diabetes disease using an ensemble of machine learning multi-classifier models.使用机器学习多分类器集成模型预测糖尿病疾病。
BMC Bioinformatics. 2023 Sep 12;24(1):337. doi: 10.1186/s12859-023-05465-z.
9
Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random.仅使用辅助预测缺失变量的多重插补可能会因数据缺失而增加偏差。
BMC Med Res Methodol. 2024 Oct 7;24(1):231. doi: 10.1186/s12874-024-02353-9.
10
Development and Validation of an Explainable Machine Learning Model for Predicting Myocardial Injury After Noncardiac Surgery in Two Centers in China: Retrospective Study.中国两个中心用于预测非心脏手术后心肌损伤的可解释机器学习模型的开发与验证:一项回顾性研究
JMIR Aging. 2024 Jul 26;7:e54872. doi: 10.2196/54872.

本文引用的文献

1
Missing data imputation, prediction, and feature selection in diagnosis of vaginal prolapse.阴道脱垂诊断中的缺失数据插补、预测和特征选择。
BMC Med Res Methodol. 2023 Nov 6;23(1):259. doi: 10.1186/s12874-023-02079-0.
2
Steps to avoid overuse and misuse of machine learning in clinical research.避免在临床研究中过度使用和滥用机器学习的步骤。
Nat Med. 2022 Oct;28(10):1996-1999. doi: 10.1038/s41591-022-01961-6.
3
Population median imputation was noninferior to complex approaches for imputing missing values in cardiovascular prediction models in clinical practice.
在临床实践中,对于心血管预测模型中缺失值的插补,人群中位数插补并不逊于复杂方法。
J Clin Epidemiol. 2022 May;145:70-80. doi: 10.1016/j.jclinepi.2022.01.011. Epub 2022 Jan 21.
4
Missing data is poorly handled and reported in prediction model studies using machine learning: a literature review.机器学习预测模型研究中对缺失数据的处理和报告很差劲:文献综述。
J Clin Epidemiol. 2022 Feb;142:218-229. doi: 10.1016/j.jclinepi.2021.11.023. Epub 2021 Nov 16.
5
Missing data was handled inconsistently in UK prediction models: a review of method used.英国预测模型中缺失数据的处理不一致:方法使用回顾。
J Clin Epidemiol. 2021 Dec;140:149-158. doi: 10.1016/j.jclinepi.2021.09.008. Epub 2021 Sep 11.
6
Prediction of myopia onset with refractive error measured using non-cycloplegic subjective refraction: the WEPrOM Study.使用非睫状肌麻痹主观验光测量的屈光不正预测近视发病:WEPrOM研究
BMJ Open Ophthalmol. 2021 Jun 9;6(1):e000628. doi: 10.1136/bmjophth-2020-000628. eCollection 2021.
7
Real-time imputation of missing predictor values improved the application of prediction models in daily practice.实时插补缺失预测值可提高预测模型在日常实践中的应用。
J Clin Epidemiol. 2021 Jun;134:22-34. doi: 10.1016/j.jclinepi.2021.01.003. Epub 2021 Jan 19.
8
Missing data should be handled differently for prediction than for description or causal explanation.缺失数据在预测、描述和因果解释方面的处理方式应有所不同。
J Clin Epidemiol. 2020 Sep;125:183-187. doi: 10.1016/j.jclinepi.2020.03.028. Epub 2020 Jun 12.
9
Machine learning algorithms performed no better than regression models for prognostication in traumatic brain injury.机器学习算法在预测创伤性脑损伤方面并不比回归模型表现得更好。
J Clin Epidemiol. 2020 Jun;122:95-107. doi: 10.1016/j.jclinepi.2020.03.005. Epub 2020 Mar 20.
10
What is Machine Learning? A Primer for the Epidemiologist.什么是机器学习?流行病学人员入门指南。
Am J Epidemiol. 2019 Dec 31;188(12):2222-2239. doi: 10.1093/aje/kwz189.