ArcTEX——一种新型临床数据富集流程，用于支持肿瘤学真实世界证据研究。

ArcTEX-a novel clinical data enrichment pipeline to support real-world evidence oncology studies.

作者信息

Tait Keiran, Cronin Joseph, Wiper Olivia, Wallis Jamie, Davies Jim, Dürichen Robert

机构信息

Arcturis Data, Kidlington, United Kingdom.

Department of Computer Science, University of Oxford, Oxford, United Kingdom.

出版信息

Front Digit Health. 2025 May 9;7:1561358. doi: 10.3389/fdgth.2025.1561358. eCollection 2025.

DOI:10.3389/fdgth.2025.1561358

PMID:40416094

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12098606/

Abstract

Data stored within electronic health records (EHRs) offer a valuable source of information for real-world evidence (RWE) studies in oncology. However, many key clinical features are only available within unstructured notes. We present ArcTEX, a novel data enrichment pipeline developed to extract oncological features from NHS unstructured clinical notes with high accuracy, even in resource-constrained environments where availability of GPUs might be limited. By design, the predicted outcomes of ArcTEX are free of patient-identifiable information, making this pipeline ideally suited for use in Trust environments. We compare our pipeline to existing discriminative and generative models, demonstrating its superiority over approaches such as Llama3/3.1/3.2 and other BERT based models, with a mean accuracy of 98.67% for several essential clinical features in endometrial and breast cancer. Additionally, we show that as few as 50 annotated training examples are needed to adapt the model to a different oncology area, such as lung cancer, with a different set of priority clinical features, achieving a comparable mean accuracy of 95% on average.

摘要

电子健康记录（EHR）中存储的数据为肿瘤学的真实世界证据（RWE）研究提供了宝贵的信息来源。然而，许多关键临床特征仅存在于非结构化笔记中。我们展示了ArcTEX，这是一种新型的数据丰富管道，旨在从英国国家医疗服务体系（NHS）的非结构化临床笔记中高精度提取肿瘤学特征，即使在GPU可用性可能有限的资源受限环境中也是如此。通过设计，ArcTEX的预测结果不包含患者可识别信息，这使得该管道非常适合在信托环境中使用。我们将我们的管道与现有的判别式和生成式模型进行比较，证明其优于诸如Llama3/3.1/3.2和其他基于BERT的模型等方法，对于子宫内膜癌和乳腺癌的几个关键临床特征，平均准确率达到98.67%。此外，我们表明，只需50个带注释的训练示例，就能使模型适应不同的肿瘤学领域，如肺癌，并具有不同的一组优先临床特征，平均实现95%的可比平均准确率。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

ArcTEX——一种新型临床数据富集流程，用于支持肿瘤学真实世界证据研究。

ArcTEX-a novel clinical data enrichment pipeline to support real-world evidence oncology studies.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

ArcTEX——一种新型临床数据富集流程，用于支持肿瘤学真实世界证据研究。

ArcTEX-a novel clinical data enrichment pipeline to support real-world evidence oncology studies.

作者信息

机构信息

出版信息

相似文献

本文引用的文献