• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ArcTEX——一种新型临床数据富集流程,用于支持肿瘤学真实世界证据研究。

ArcTEX-a novel clinical data enrichment pipeline to support real-world evidence oncology studies.

作者信息

Tait Keiran, Cronin Joseph, Wiper Olivia, Wallis Jamie, Davies Jim, Dürichen Robert

机构信息

Arcturis Data, Kidlington, United Kingdom.

Department of Computer Science, University of Oxford, Oxford, United Kingdom.

出版信息

Front Digit Health. 2025 May 9;7:1561358. doi: 10.3389/fdgth.2025.1561358. eCollection 2025.

DOI:10.3389/fdgth.2025.1561358
PMID:40416094
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12098606/
Abstract

Data stored within electronic health records (EHRs) offer a valuable source of information for real-world evidence (RWE) studies in oncology. However, many key clinical features are only available within unstructured notes. We present ArcTEX, a novel data enrichment pipeline developed to extract oncological features from NHS unstructured clinical notes with high accuracy, even in resource-constrained environments where availability of GPUs might be limited. By design, the predicted outcomes of ArcTEX are free of patient-identifiable information, making this pipeline ideally suited for use in Trust environments. We compare our pipeline to existing discriminative and generative models, demonstrating its superiority over approaches such as Llama3/3.1/3.2 and other BERT based models, with a mean accuracy of 98.67% for several essential clinical features in endometrial and breast cancer. Additionally, we show that as few as 50 annotated training examples are needed to adapt the model to a different oncology area, such as lung cancer, with a different set of priority clinical features, achieving a comparable mean accuracy of 95% on average.

摘要

电子健康记录(EHR)中存储的数据为肿瘤学的真实世界证据(RWE)研究提供了宝贵的信息来源。然而,许多关键临床特征仅存在于非结构化笔记中。我们展示了ArcTEX,这是一种新型的数据丰富管道,旨在从英国国家医疗服务体系(NHS)的非结构化临床笔记中高精度提取肿瘤学特征,即使在GPU可用性可能有限的资源受限环境中也是如此。通过设计,ArcTEX的预测结果不包含患者可识别信息,这使得该管道非常适合在信托环境中使用。我们将我们的管道与现有的判别式和生成式模型进行比较,证明其优于诸如Llama3/3.1/3.2和其他基于BERT的模型等方法,对于子宫内膜癌和乳腺癌的几个关键临床特征,平均准确率达到98.67%。此外,我们表明,只需50个带注释的训练示例,就能使模型适应不同的肿瘤学领域,如肺癌,并具有不同的一组优先临床特征,平均实现95%的可比平均准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/70e13a7e0bca/fdgth-07-1561358-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/0c1bd919ae9a/fdgth-07-1561358-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/3878adaed230/fdgth-07-1561358-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/75f45793a092/fdgth-07-1561358-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/1bfd4e89d21e/fdgth-07-1561358-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/0d9c1b2bd22d/fdgth-07-1561358-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/5cb9f6301cac/fdgth-07-1561358-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/d822dc3ab8ca/fdgth-07-1561358-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/c37db4063a74/fdgth-07-1561358-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/70e13a7e0bca/fdgth-07-1561358-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/0c1bd919ae9a/fdgth-07-1561358-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/3878adaed230/fdgth-07-1561358-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/75f45793a092/fdgth-07-1561358-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/1bfd4e89d21e/fdgth-07-1561358-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/0d9c1b2bd22d/fdgth-07-1561358-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/5cb9f6301cac/fdgth-07-1561358-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/d822dc3ab8ca/fdgth-07-1561358-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/c37db4063a74/fdgth-07-1561358-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/401c/12098606/70e13a7e0bca/fdgth-07-1561358-g009.jpg

相似文献

1
ArcTEX-a novel clinical data enrichment pipeline to support real-world evidence oncology studies.ArcTEX——一种新型临床数据富集流程,用于支持肿瘤学真实世界证据研究。
Front Digit Health. 2025 May 9;7:1561358. doi: 10.3389/fdgth.2025.1561358. eCollection 2025.
2
Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies.为真实世界证据生成可分析数据:利用先进信息学技术驾驭电子健康记录的教程。
J Med Internet Res. 2023 May 25;25:e45662. doi: 10.2196/45662.
3
CACER: Clinical concept Annotations for Cancer Events and Relations.CACER:癌症事件与关系的临床概念注释。
J Am Med Inform Assoc. 2024 Nov 1;31(11):2583-2594. doi: 10.1093/jamia/ocae231.
4
Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data.使用自然语言处理方法从自由文本和非结构化患者生成的健康数据中提取医学信息:基于真实世界数据的可行性研究
JMIR Form Res. 2023 Mar 7;7:e43014. doi: 10.2196/43014.
5
MISTIC: a novel approach for metastasis classification in Italian electronic health records using transformers.MISTIC:一种使用变压器对意大利电子健康记录中的转移进行分类的新方法。
BMC Med Inform Decis Mak. 2025 Apr 10;25(1):160. doi: 10.1186/s12911-025-02994-w.
6
A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation Study.基于荷兰全科电子健康记录的 COVID-19 检测自然语言处理模型:使用转换器的双向编码器表示进行开发和验证研究。
J Med Internet Res. 2023 Oct 4;25:e49944. doi: 10.2196/49944.
7
A Deep Learning-Enabled Workflow to Estimate Real-World Progression-Free Survival in Patients With Metastatic Breast Cancer: Study Using Deidentified Electronic Health Records.一种用于估计转移性乳腺癌患者真实世界无进展生存期的深度学习工作流程:使用去识别化电子健康记录的研究
JMIR Cancer. 2025 May 15;11:e64697. doi: 10.2196/64697.
8
Generating real-world evidence from unstructured clinical notes to examine clinical utility of genetic tests: use case in BRCAness.从非结构化临床笔记中生成真实世界证据,以检验遗传检测的临床效用:BRCA 状态案例研究。
BMC Med Inform Decis Mak. 2021 Jan 6;21(1):3. doi: 10.1186/s12911-020-01364-y.
9
Negation recognition in clinical natural language processing using a combination of the NegEx algorithm and a convolutional neural network.使用 NegEx 算法和卷积神经网络相结合的方法进行临床自然语言处理中的否定识别。
BMC Med Inform Decis Mak. 2023 Oct 13;23(1):216. doi: 10.1186/s12911-023-02301-5.
10
OpenDeID Pipeline for Unstructured Electronic Health Record Text Notes Based on Rules and Transformers: Deidentification Algorithm Development and Validation Study.基于规则和转换器的非结构化电子健康记录文本注释的 OpenDeID 管道:去识别算法的开发和验证研究。
J Med Internet Res. 2023 Dec 6;25:e48145. doi: 10.2196/48145.

本文引用的文献

1
Reconciling the contrasting narratives on the environmental impact of large language models.调和关于大型语言模型环境影响的相互矛盾的说法。
Sci Rep. 2024 Nov 1;14(1):26310. doi: 10.1038/s41598-024-76682-6.
2
Evaluating GPT and BERT models for protein-protein interaction identification in biomedical text.评估GPT和BERT模型用于生物医学文本中蛋白质-蛋白质相互作用的识别
Bioinform Adv. 2024 Sep 11;4(1):vbae133. doi: 10.1093/bioadv/vbae133. eCollection 2024.
3
Evaluating and Enhancing Large Language Models' Performance in Domain-Specific Medicine: Development and Usability Study With DocOA.
评估和增强特定领域医学中大型语言模型的性能:DocOA 的开发和可用性研究
J Med Internet Res. 2024 Jul 22;26:e58158. doi: 10.2196/58158.
4
A comprehensive evaluation of large Language models on benchmark biomedical text processing tasks.对基准生物医学文本处理任务中大型语言模型的全面评估。
Comput Biol Med. 2024 Mar;171:108189. doi: 10.1016/j.compbiomed.2024.108189. Epub 2024 Feb 20.
5
AI-Assisted Summarization of Radiologic Reports: Evaluating GPT3davinci, BARTcnn, LongT5booksum, LEDbooksum, LEDlegal, and LEDclinical.放射学报告的人工智能辅助摘要:评估GPT3davinci、BARTcnn、LongT5booksum、LEDbooksum、LEDlegal和LEDclinical。
AJNR Am J Neuroradiol. 2024 Feb 7;45(2):244-248. doi: 10.3174/ajnr.A8102.
6
An extensive benchmark study on biomedical text generation and mining with ChatGPT.一项关于使用ChatGPT进行生物医学文本生成和挖掘的广泛基准研究。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad557.
7
Clinical named entity recognition and relation extraction using natural language processing of medical free text: A systematic review.临床命名实体识别和关系抽取技术在医学自然语言处理中的应用:系统综述。
Int J Med Inform. 2023 Sep;177:105122. doi: 10.1016/j.ijmedinf.2023.105122. Epub 2023 Jun 5.
8
Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review.电子健康记录中与医疗决策相关的自然语言处理:一项系统综述。
Comput Biol Med. 2023 Mar;155:106649. doi: 10.1016/j.compbiomed.2023.106649. Epub 2023 Feb 10.
9
Refining adjuvant treatment in endometrial cancer based on molecular features: the RAINBO clinical trial program.基于分子特征优化子宫内膜癌辅助治疗:RAINBO临床试验项目
Int J Gynecol Cancer. 2023 Jan 3;33(1):109-117. doi: 10.1136/ijgc-2022-004039.
10
A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.深度学习模型在不同类别不平衡程度的非结构化医疗记录文本分类中的对比研究。
BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.