• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用大型人群健康数据库比较人工智能/机器学习方法和经典回归进行预测建模:在新冠病例预测中的应用

Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction.

作者信息

Bjerre Lise M, Peixoto Cayden, Alkurd Rawan, Talarico Robert, Abielmona Rami

机构信息

Institut du Savoir Montfort, 713, chemin Montréal, Ottawa, Ontario K1K 0T2, Canada.

University of Ottawa, Faculty of Medicine, Department of Family Medicine, 201-600 Peter-Morand Crescent, Ottawa ON, K1G 5Z3, Canada.

出版信息

Glob Epidemiol. 2024 Oct 4;8:100168. doi: 10.1016/j.gloepi.2024.100168. eCollection 2024 Dec.

DOI:10.1016/j.gloepi.2024.100168
PMID:39435397
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11492135/
Abstract

BACKGROUND

Research comparing artificial intelligence and machine learning (AI/ML) methods with classical statistical methods applied to large population health databases is limited.

OBJECTIVES

This retrospective cohort study aimed to compare the predictive performance of AI/ML algorithms against conventional multivariate logistic regression models using linked health administrative data.

METHODS

Using Ontario's population health databases, we created a cohort of residents of the city of Ottawa, Ontario, who underwent a PCR test for COVID-19 between March 10, 2020, and May 13, 2021. Using demographic, socio-economic and health data (including COVID-19 PCR test results and available, symptom data), we developed predictive models for the purpose of COVID-19 case identification using the following approaches: classical multivariate logistic regression (LR); deep neural network (DNN); random forest (RF); and gradient boosting trees (GBT). Model performance comparisons were made using the area under the curve (AUC) swarm plot for 10-fold cross-validation.

RESULTS

The cohort consisted of  = 351,248 Ottawa residents tested for COVID-19 during the study period. Among whom, a total of  = 883,879 unique COVID-19 tests were performed (2.6 % positive test results). Inclusion of COVID-19 symptoms data in the analysis improved model performance and variable predictive value across all tested models ( < 0.0001), with the 10-fold cross-validation AUC increasing to near or over 0.7 in all models when symptoms data were included. In various pairwise comparisons, the GBT method had the highest predictive ability (AUC = 0.796 ± 0.017), significantly outperforming multivariate logistic regression and the other AI/ML approaches.

CONCLUSIONS

Conventional multivariate regression-based models are better than some and worse than other machine learning algorithms to provide good predictive accuracy in a moderate dataset with a reasonable number of features. However, whenever possible, the AI/ML GBT approach should be considered.

摘要

背景

将人工智能和机器学习(AI/ML)方法与应用于大型人群健康数据库的经典统计方法进行比较的研究有限。

目的

这项回顾性队列研究旨在使用关联的健康管理数据,比较AI/ML算法与传统多变量逻辑回归模型的预测性能。

方法

利用安大略省的人群健康数据库,我们创建了一组安大略省渥太华市居民的队列,他们在2020年3月10日至2021年5月13日期间接受了新冠病毒病(COVID-19)的聚合酶链反应(PCR)检测。利用人口统计学、社会经济和健康数据(包括COVID-19 PCR检测结果和可用的症状数据),我们采用以下方法开发了用于COVID-19病例识别的预测模型:经典多变量逻辑回归(LR);深度神经网络(DNN);随机森林(RF);以及梯度提升树(GBT)。使用曲线下面积(AUC)群图进行10折交叉验证,对模型性能进行比较。

结果

该队列包括在研究期间接受COVID-19检测的351,248名渥太华居民。其中,共进行了883,879次独特的COVID-19检测(检测结果阳性率为2.6%)。在所有测试模型中,将COVID-19症状数据纳入分析可提高模型性能和变量预测价值(P<0.0001),当纳入症状数据时,所有模型的10折交叉验证AUC增加到接近或超过0.7。在各种成对比较中,GBT方法具有最高的预测能力(AUC=0.796±0.017),显著优于多变量逻辑回归和其他AI/ML方法。

结论

在具有合理数量特征的数据适中的数据集里,基于传统多变量回归的模型在提供良好预测准确性方面优于某些机器学习算法,但不如其他算法。然而,只要有可能,就应考虑使用AI/ML的GBT方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b821/11492135/f1ff54a44c43/gr3a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b821/11492135/a8cc8275a43c/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b821/11492135/9de85568191d/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b821/11492135/f1ff54a44c43/gr3a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b821/11492135/a8cc8275a43c/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b821/11492135/9de85568191d/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b821/11492135/f1ff54a44c43/gr3a.jpg

相似文献

1
Comparing AI/ML approaches and classical regression for predictive modeling using large population health databases: Applications to COVID-19 case prediction.使用大型人群健康数据库比较人工智能/机器学习方法和经典回归进行预测建模:在新冠病例预测中的应用
Glob Epidemiol. 2024 Oct 4;8:100168. doi: 10.1016/j.gloepi.2024.100168. eCollection 2024 Dec.
2
Artificial intelligence in clinical care amidst COVID-19 pandemic: A systematic review.COVID-19大流行期间临床护理中的人工智能:一项系统综述。
Comput Struct Biotechnol J. 2021;19:2833-2850. doi: 10.1016/j.csbj.2021.05.010. Epub 2021 May 7.
3
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
4
Mortality Prediction of Patients With Cardiovascular Disease Using Medical Claims Data Under Artificial Intelligence Architectures: Validation Study.利用人工智能架构下的医疗理赔数据预测心血管疾病患者的死亡率:验证研究
JMIR Med Inform. 2021 Apr 1;9(4):e25000. doi: 10.2196/25000.
5
Establishment and validation of an interactive artificial intelligence platform to predict postoperative ambulatory status for patients with metastatic spinal disease: a multicenter analysis.建立和验证交互式人工智能平台,以预测转移性脊柱疾病患者的术后活动状态:一项多中心分析。
Int J Surg. 2024 May 1;110(5):2738-2756. doi: 10.1097/JS9.0000000000001169.
6
Evaluating methods for risk prediction of Covid-19 mortality in nursing home residents before and after vaccine availability: a retrospective cohort study.评估疫苗供应前后养老院居民 COVID-19 死亡率风险预测方法的回顾性队列研究。
BMC Med Res Methodol. 2024 Mar 27;24(1):77. doi: 10.1186/s12874-024-02189-3.
7
Comparing machine learning algorithms to predict COVID‑19 mortality using a dataset including chest computed tomography severity score data.比较机器学习算法,使用包含胸部计算机断层扫描严重程度评分数据的数据集来预测 COVID-19 死亡率。
Sci Rep. 2023 Jul 13;13(1):11343. doi: 10.1038/s41598-023-38133-6.
8
[Constructing a predictive model for the death risk of patients with septic shock based on supervised machine learning algorithms].基于监督机器学习算法构建脓毒症休克患者死亡风险预测模型
Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2024 Apr;36(4):345-352. doi: 10.3760/cma.j.cn121430-20230930-00832.
9
Development and validation of an artificial intelligence mobile application for predicting 30-day mortality in critically ill patients with orthopaedic trauma.开发和验证一种用于预测骨科创伤危重症患者 30 天死亡率的人工智能移动应用程序。
Int J Med Inform. 2024 Apr;184:105383. doi: 10.1016/j.ijmedinf.2024.105383. Epub 2024 Feb 17.
10
Developing and comparing deep learning and machine learning algorithms for osteoporosis risk prediction.开发并比较用于骨质疏松症风险预测的深度学习和机器学习算法。
Front Artif Intell. 2024 Jun 11;7:1355287. doi: 10.3389/frai.2024.1355287. eCollection 2024.

引用本文的文献

1
Development and validation of a machine learning model for predicting venous thromboembolism complications following colorectal cancer surgery.用于预测结直肠癌手术后静脉血栓栓塞并发症的机器学习模型的开发与验证
Vis Comput Ind Biomed Art. 2025 Sep 12;8(1):22. doi: 10.1186/s42492-025-00204-y.
2
Exploring the potential and limitations of deep learning and explainable AI for longitudinal life course analysis.探索深度学习和可解释人工智能在纵向生命历程分析中的潜力与局限性。
BMC Public Health. 2025 Apr 24;25(1):1520. doi: 10.1186/s12889-025-22705-4.

本文引用的文献

1
A comparison of machine learning algorithms and traditional regression-based statistical modeling for predicting hypertension incidence in a Canadian population.比较机器学习算法和基于传统回归的统计建模在预测加拿大人群高血压发病率中的应用。
Sci Rep. 2023 Jan 2;13(1):13. doi: 10.1038/s41598-022-27264-x.
2
Gradient boosting decision tree becomes more reliable than logistic regression in predicting probability for diabetes with big data.梯度提升决策树在预测大数据下糖尿病概率方面比逻辑回归更可靠。
Sci Rep. 2022 Oct 11;12(1):15889. doi: 10.1038/s41598-022-20149-z.
3
Empirical analyses and simulations showed that different machine and statistical learning methods had differing performance for predicting blood pressure.
实证分析和模拟表明,不同的机器和统计学习方法在预测血压方面的表现有所不同。
Sci Rep. 2022 Jun 3;12(1):9312. doi: 10.1038/s41598-022-13015-5.
4
Comparison Between Statistical Model and Machine Learning Methods for Predicting the Risk of Renal Function Decline Using Routine Clinical Data in Health Screening.利用健康筛查中的常规临床数据预测肾功能下降风险的统计模型与机器学习方法的比较
Risk Manag Healthc Policy. 2022 Apr 26;15:817-826. doi: 10.2147/RMHP.S346856. eCollection 2022.
5
Methodological guidelines to estimate population-based health indicators using linked data and/or machine learning techniques.使用关联数据和/或机器学习技术估计基于人群的健康指标的方法指南。
Arch Public Health. 2022 Jan 4;80(1):9. doi: 10.1186/s13690-021-00770-6.
6
Strategies for building robust prediction models using data unavailable at prediction time.利用预测时不可用的数据构建稳健预测模型的策略。
J Am Med Inform Assoc. 2021 Dec 28;29(1):72-79. doi: 10.1093/jamia/ocab229.
7
Risk Factors Associated with Nursing Home COVID-19 Outbreaks: A Retrospective Cohort Study.与养老院 COVID-19 爆发相关的风险因素:一项回顾性队列研究。
Int J Environ Res Public Health. 2021 Aug 10;18(16):8434. doi: 10.3390/ijerph18168434.
8
Data mining in clinical big data: the frequently used databases, steps, and methodological models.临床大数据中的数据挖掘:常用数据库、步骤和方法学模型。
Mil Med Res. 2021 Aug 11;8(1):44. doi: 10.1186/s40779-021-00338-z.
9
Comparison of the Predicting Performance for Fate of Medial Meniscus Posterior Root Tear Based on Treatment Strategies: A Comparison between Logistic Regression, Gradient Boosting, and CNN Algorithms.基于治疗策略的内侧半月板后根撕裂预后预测性能比较:逻辑回归、梯度提升和卷积神经网络算法之间的比较
Diagnostics (Basel). 2021 Jul 7;11(7):1225. doi: 10.3390/diagnostics11071225.
10
Real-time interactive artificial intelligence of things-based prediction for adverse outcomes in adult patients with pneumonia in the emergency department.基于实时交互人工智能的物联网预测,用于急诊科成人肺炎不良结局。
Acad Emerg Med. 2021 Nov;28(11):1277-1285. doi: 10.1111/acem.14339. Epub 2021 Jul 29.