Fundació Institut Universitari per a la recerca a l'Atenció Primària de Salut Jordi Gol I Gurina (IDIAPJGol), Barcelona, 08007, Spain.
Department of Signal Theory and Communications, Universitat Politècnica de Catalunya (UPC), Barcelona, 08034, Spain.
J Am Med Inform Assoc. 2023 Nov 17;30(12):2072-2082. doi: 10.1093/jamia/ocad168.
To describe and appraise the use of artificial intelligence (AI) techniques that can cope with longitudinal data from electronic health records (EHRs) to predict health-related outcomes.
This review included studies in any language that: EHR was at least one of the data sources, collected longitudinal data, used an AI technique capable of handling longitudinal data, and predicted any health-related outcomes. We searched MEDLINE, Scopus, Web of Science, and IEEE Xplorer from inception to January 3, 2022. Information on the dataset, prediction task, data preprocessing, feature selection, method, validation, performance, and implementation was extracted and summarized using descriptive statistics. Risk of bias and completeness of reporting were assessed using a short form of PROBAST and TRIPOD, respectively.
Eighty-one studies were included. Follow-up time and number of registers per patient varied greatly, and most predicted disease development or next event based on diagnoses and drug treatments. Architectures generally were based on Recurrent Neural Networks-like layers, though in recent years combining different layers or transformers has become more popular. About half of the included studies performed hyperparameter tuning and used attention mechanisms. Most performed a single train-test partition and could not correctly assess the variability of the model's performance. Reporting quality was poor, and a third of the studies were at high risk of bias.
AI models are increasingly using longitudinal data. However, the heterogeneity in reporting methodology and results, and the lack of public EHR datasets and code sharing, complicate the possibility of replication.
PROSPERO database (CRD42022331388).
描述和评估能够处理电子健康记录(EHR)中纵向数据的人工智能(AI)技术在预测健康相关结局方面的应用。
本综述纳入了任何语言的研究:EHR 至少是其中一种数据源,收集了纵向数据,使用了能够处理纵向数据的 AI 技术,并预测了任何健康相关结局。我们从建库到 2022 年 1 月 3 日在 MEDLINE、Scopus、Web of Science 和 IEEE Xplorer 进行了检索。使用描述性统计方法提取并总结了关于数据集、预测任务、数据预处理、特征选择、方法、验证、性能和实施的信息。使用 PROBAST 和 TRIPOD 的简短形式分别评估了偏倚风险和报告的完整性。
纳入了 81 项研究。随访时间和每位患者的登记数量差异很大,大多数研究基于诊断和药物治疗预测疾病发展或下一个事件。架构通常基于类似于循环神经网络的层,尽管近年来结合不同的层或变压器变得越来越流行。约一半的纳入研究进行了超参数调整并使用了注意力机制。大多数研究仅进行了一次训练-测试分区,无法正确评估模型性能的可变性。报告质量较差,三分之一的研究存在高偏倚风险。
AI 模型越来越多地使用纵向数据。然而,报告方法和结果的异质性,以及缺乏公共 EHR 数据集和代码共享,使得复制变得复杂。
PROSPERO 数据库(CRD42022331388)。