Suppr超能文献

机器学习在疾病预测与管理中分析真实世界数据的应用:系统评价

The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review.

作者信息

Alhumaidi Norah Hamad, Dermawan Doni, Kamaruzaman Hanin Farhana, Alotaiq Nasser

机构信息

College of Medicine, Qassim University, Buraidah, Saudi Arabia.

Applied Biotechnology, Faculty of Chemistry, Warsaw University of Technology, Warsaw, Poland.

出版信息

JMIR Med Inform. 2025 Jun 19;13:e68898. doi: 10.2196/68898.

Abstract

BACKGROUND

Machine learning (ML) and big data analytics are rapidly transforming health care, particularly disease prediction, management, and personalized care. With the increasing availability of real-world data (RWD) from diverse sources, such as electronic health records (EHRs), patient registries, and wearable devices, ML techniques present substantial potential to enhance clinical outcomes. Despite this promise, challenges such as data quality, model transparency, generalizability, and integration into clinical practice persist.

OBJECTIVE

This systematic review aims to examine the use of ML for analyzing RWD in disease prediction and management, identifying the most commonly used ML methods, prevalent disease types, study designs, and the sources of real-world evidence (RWE). It also explores the strengths and limitations of current practices, offering insights for future improvements.

METHODS

A comprehensive search was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to identify studies using ML techniques for analyzing RWD in disease prediction and management. The search focused on extracting data regarding the ML algorithms applied; disease categories studied; types of study designs (eg, clinical trials and cohort studies); and the sources of RWE, including EHRs, patient registries, and wearable devices. Studies published between 2014 and 2024 were included to ensure the analysis of the most recent advances in the field.

RESULTS

This review identified 57 studies that met the inclusion criteria, with a total sample size of >150,000 patients. The most frequently applied ML methods were random forest (n=24, 42%), logistic regression (n=21, 37%), and support vector machines (n=18, 32%). These methods were predominantly used for predictive modeling across disease areas, including cardiovascular diseases (n=19, 33%), cancer (n=9, 16%), and neurological disorders (n=6, 11%). RWE was primarily sourced from EHRs, patient registries, and wearable devices. A substantial portion of studies (n=38, 67%) focused on improving clinical decision-making, patient stratification, and treatment optimization. Among these studies, 14 (25%) focused on decision-making; 12 (21%) on health care outcomes, such as quality of life, recovery rates, and adverse events; and 11 (19%) on survival prediction, particularly in oncology and chronic diseases. For example, random forest models for cardiovascular disease prediction demonstrated an area under the curve of 0.85 (95% CI 0.81-0.89), while support vector machine models for cancer prognosis achieved an accuracy of 83% (P=.04). Despite the promising outcomes, many (n=34, 60%) studies faced challenges related to data quality, model interpretability, and ensuring generalizability across diverse patient populations.

CONCLUSIONS

This systematic review highlights the significant potential of ML and big data analytics in health care, especially for improving disease prediction and management. However, to fully realize the benefits of these technologies, future research must focus on addressing the challenges of data quality, enhancing model transparency, and ensuring the broader applicability of ML models across diverse populations and clinical settings.

摘要

背景

机器学习(ML)和大数据分析正在迅速改变医疗保健领域,尤其是疾病预测、管理和个性化医疗。随着来自电子健康记录(EHR)、患者登记册和可穿戴设备等不同来源的真实世界数据(RWD)越来越容易获取,ML技术在改善临床结果方面具有巨大潜力。尽管有此前景,但数据质量、模型透明度、可推广性以及融入临床实践等挑战依然存在。

目的

本系统评价旨在研究ML在疾病预测和管理中分析RWD的应用,确定最常用的ML方法、常见疾病类型、研究设计以及真实世界证据(RWE)的来源。它还探讨了当前实践的优势和局限性,为未来的改进提供见解。

方法

按照PRISMA(系统评价和Meta分析的首选报告项目)指南进行全面检索,以确定使用ML技术分析疾病预测和管理中的RWD的研究。检索重点是提取有关应用的ML算法、研究的疾病类别、研究设计类型(如临床试验和队列研究)以及RWE来源(包括EHR、患者登记册和可穿戴设备)的数据。纳入2014年至2024年间发表的研究,以确保对该领域的最新进展进行分析。

结果

本评价确定了57项符合纳入标准的研究,总样本量超过150,000名患者。最常应用的ML方法是随机森林(n = 24,42%)、逻辑回归(n = 21,37%)和支持向量机(n = 18,32%)。这些方法主要用于跨疾病领域的预测建模,包括心血管疾病(n = 19,33%)、癌症(n = 9,16%)和神经系统疾病(n = 6,11%)。RWE主要来源于EHR、患者登记册和可穿戴设备。相当一部分研究(n = 38,67%)专注于改善临床决策、患者分层和治疗优化。在这些研究中,14项(25%)专注于决策;12项(21%)关注医疗保健结果,如生活质量、康复率和不良事件;11项(19%)关注生存预测,特别是在肿瘤学和慢性病方面。例如,用于心血管疾病预测的随机森林模型的曲线下面积为0.85(95%CI 0.81 - 0.89),而用于癌症预后的支持向量机模型的准确率为83%(P = 0.04)。尽管结果很有前景,但许多研究(n = 34,60%)面临与数据质量、模型可解释性以及确保在不同患者群体中的可推广性相关的挑战。

结论

本系统评价突出了ML和大数据分析在医疗保健中的巨大潜力,特别是在改善疾病预测和管理方面。然而,为了充分实现这些技术的益处,未来的研究必须专注于应对数据质量挑战、提高模型透明度以及确保ML模型在不同人群和临床环境中的更广泛适用性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验