Suppr超能文献

应用于电子健康记录纵向数据以预测癌症的人工智能方法:一项范围综述。

Artificial intelligence methods applied to longitudinal data from electronic health records for prediction of cancer: a scoping review.

作者信息

Moglia Victoria, Johnson Owen, Cook Gordon, de Kamps Marc, Smith Lesley

机构信息

School of Computing, University of Leeds, Woodhouse Lane, Leeds, LS2 9JT, UK.

Leeds Institute of Clinical Trials Research, University of Leeds, Clarendon Way, Leeds, LS2 9NL, UK.

出版信息

BMC Med Res Methodol. 2025 Jan 28;25(1):24. doi: 10.1186/s12874-025-02473-w.

Abstract

BACKGROUND

Early detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored within electronic health records (EHRs). This review aims to summarise methods currently utilised for prediction of cancer from longitudinal data and provides recommendations on how such models should be developed.

METHODS

The review was conducted following PRISMA-ScR guidance. Six databases (MEDLINE, EMBASE, Web of Science, IEEE Xplore, PubMed and SCOPUS) were searched for relevant records published before 2/2/2024. Search terms related to the concepts "artificial intelligence", "prediction", "health records", "longitudinal", and "cancer". Data were extracted relating to several areas of the articles: (1) publication details, (2) study characteristics, (3) input data, (4) model characteristics, (4) reproducibility, and (5) quality assessment using the PROBAST tool. Models were evaluated against a framework for terminology relating to reporting of cancer detection and risk prediction models.

RESULTS

Of 653 records screened, 33 were included in the review; 10 predicted risk of cancer, 18 performed either cancer detection or early detection, 4 predicted recurrence, and 1 predicted metastasis. The most common cancers predicted in the studies were colorectal (n = 9) and pancreatic cancer (n = 9). 16 studies used feature engineering to represent temporal data, with the most common features representing trends. 18 used deep learning models which take a direct sequential input, most commonly recurrent neural networks, but also including convolutional neural networks and transformers. Prediction windows and lead times varied greatly between studies, even for models predicting the same cancer. High risk of bias was found in 90% of the studies. This risk was often introduced due to inappropriate study design (n = 26) and sample size (n = 26).

CONCLUSION

This review highlights the breadth of approaches to cancer prediction from longitudinal data. We identify areas where reporting of methods could be improved, particularly regarding where in a patients' trajectory the model is applied. The review shows opportunities for further work, including comparison of these approaches and their applications in other cancers.

摘要

背景

癌症的早期检测和诊断对于改善患者预后至关重要。人工智能(AI)模型在癌症的早期检测和诊断方面已显示出前景,但关于充分利用电子健康记录(EHR)中存储的纵向数据的方法的证据有限。本综述旨在总结目前用于从纵向数据预测癌症的方法,并就如何开发此类模型提供建议。

方法

本综述遵循PRISMA-ScR指南进行。检索了六个数据库(MEDLINE、EMBASE、Web of Science、IEEE Xplore、PubMed和SCOPUS),以查找2024年2月2日前发表的相关记录。检索词与“人工智能”、“预测”、“健康记录”、“纵向”和“癌症”等概念相关。提取了与文章几个方面相关的数据:(1)发表细节,(2)研究特征,(3)输入数据,(4)模型特征,(4)可重复性,以及(5)使用PROBAST工具进行的质量评估。根据癌症检测和风险预测模型报告的术语框架对模型进行评估。

结果

在筛选的653条记录中,33条被纳入综述;10条预测癌症风险,18条进行癌症检测或早期检测,4条预测复发,1条预测转移。研究中预测的最常见癌症是结直肠癌(n = 9)和胰腺癌(n = 9)。16项研究使用特征工程来表示时间数据,最常见的特征表示趋势。18项研究使用直接顺序输入的深度学习模型,最常见的是循环神经网络,但也包括卷积神经网络和变换器。即使对于预测同一种癌症的模型,不同研究之间的预测窗口和提前期差异也很大。在90%的研究中发现了高偏倚风险。这种风险通常是由于不适当的研究设计(n = 26)和样本量(n = 26)导致的。

结论

本综述强调了从纵向数据进行癌症预测的方法的广度。我们确定了方法报告可以改进的领域,特别是关于模型在患者病程中的应用位置。该综述显示了进一步开展工作的机会,包括比较这些方法及其在其他癌症中的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e26/11773903/385d801dd511/12874_2025_2473_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验