• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用自然语言处理模型和真实世界数据优化临床试验资格设计:算法开发与验证

Optimizing Clinical Trial Eligibility Design Using Natural Language Processing Models and Real-World Data: Algorithm Development and Validation.

作者信息

Lee Kyeryoung, Liu Zongzhi, Mai Yun, Jun Tomi, Ma Meng, Wang Tongyu, Ai Lei, Calay Ediz, Oh William, Stolovitzky Gustavo, Schadt Eric, Wang Xiaoyan

机构信息

GendDx (Sema4), Stamford, CT, United States.

Icahn School of Medicine at Mount Sinai, New York, NY, United States.

出版信息

JMIR AI. 2024 Jul 29;3:e50800. doi: 10.2196/50800.

DOI:10.2196/50800
PMID:39073872
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11319878/
Abstract

BACKGROUND

Clinical trials are vital for developing new therapies but can also delay drug development. Efficient trial data management, optimized trial protocol, and accurate patient identification are critical for reducing trial timelines. Natural language processing (NLP) has the potential to achieve these objectives.

OBJECTIVE

This study aims to assess the feasibility of using data-driven approaches to optimize clinical trial protocol design and identify eligible patients. This involves creating a comprehensive eligibility criteria knowledge base integrated within electronic health records using deep learning-based NLP techniques.

METHODS

We obtained data of 3281 industry-sponsored phase 2 or 3 interventional clinical trials recruiting patients with non-small cell lung cancer, prostate cancer, breast cancer, multiple myeloma, ulcerative colitis, and Crohn disease from ClinicalTrials.gov, spanning the period between 2013 and 2020. A customized bidirectional long short-term memory- and conditional random field-based NLP pipeline was used to extract all eligibility criteria attributes and convert hypernym concepts into computable hyponyms along with their corresponding values. To illustrate the simulation of clinical trial design for optimization purposes, we selected a subset of patients with non-small cell lung cancer (n=2775), curated from the Mount Sinai Health System, as a pilot study.

RESULTS

We manually annotated the clinical trial eligibility corpus (485/3281, 14.78% trials) and constructed an eligibility criteria-specific ontology. Our customized NLP pipeline, developed based on the eligibility criteria-specific ontology that we created through manual annotation, achieved high precision (0.91, range 0.67-1.00) and recall (0.79, range 0.50-1) scores, as well as a high F-score (0.83, range 0.67-1), enabling the efficient extraction of granular criteria entities and relevant attributes from 3281 clinical trials. A standardized eligibility criteria knowledge base, compatible with electronic health records, was developed by transforming hypernym concepts into machine-interpretable hyponyms along with their corresponding values. In addition, an interface prototype demonstrated the practicality of leveraging real-world data for optimizing clinical trial protocols and identifying eligible patients.

CONCLUSIONS

Our customized NLP pipeline successfully generated a standardized eligibility criteria knowledge base by transforming hypernym criteria into machine-readable hyponyms along with their corresponding values. A prototype interface integrating real-world patient information allows us to assess the impact of each eligibility criterion on the number of patients eligible for the trial. Leveraging NLP and real-world data in a data-driven approach holds promise for streamlining the overall clinical trial process, optimizing processes, and improving efficiency in patient identification.

摘要

背景

临床试验对于开发新疗法至关重要,但也可能会延迟药物研发。高效的试验数据管理、优化的试验方案以及准确的患者识别对于缩短试验时间线至关重要。自然语言处理(NLP)有潜力实现这些目标。

目的

本研究旨在评估使用数据驱动方法优化临床试验方案设计并识别合格患者的可行性。这涉及使用基于深度学习的NLP技术创建一个集成在电子健康记录中的综合入选标准知识库。

方法

我们从ClinicalTrials.gov获取了2013年至2020年期间3281项由行业赞助的2期或3期介入性临床试验的数据,这些试验招募非小细胞肺癌、前列腺癌、乳腺癌、多发性骨髓瘤、溃疡性结肠炎和克罗恩病患者。使用定制的基于双向长短期记忆和条件随机场的NLP管道来提取所有入选标准属性,并将上位概念转换为可计算的下位概念及其相应值。为了说明用于优化目的的临床试验设计模拟,我们从西奈山医疗系统挑选了一组非小细胞肺癌患者(n = 2775)作为试点研究。

结果

我们对手动注释的临床试验入选语料库(485/3281,14.78%的试验)进行了构建,并构建了一个特定于入选标准的本体。我们基于通过手动注释创建的特定于入选标准的本体开发的定制NLP管道,实现了高精度(0.91,范围0.67 - 1.00)和召回率(0.79,范围0.50 - 1)分数,以及高F分数(0.83,范围0.67 - 1),能够从3281项临床试验中高效提取详细的标准实体和相关属性。通过将上位概念转换为机器可解释的下位概念及其相应值,开发了一个与电子健康记录兼容的标准化入选标准知识库。此外,一个接口原型展示了利用真实世界数据优化临床试验方案和识别合格患者的实用性。

结论

我们定制的NLP管道通过将上位标准转换为机器可读的下位概念及其相应值,成功生成了一个标准化的入选标准知识库。一个集成真实世界患者信息的原型接口使我们能够评估每个入选标准对符合试验条件患者数量的影响。以数据驱动的方式利用NLP和真实世界数据有望简化整个临床试验过程、优化流程并提高患者识别效率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/035f/11319878/afba041bf46e/ai_v3i1e50800_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/035f/11319878/84ea0256abf7/ai_v3i1e50800_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/035f/11319878/adc3f97aba29/ai_v3i1e50800_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/035f/11319878/050610dc7f71/ai_v3i1e50800_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/035f/11319878/ece3384e73e7/ai_v3i1e50800_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/035f/11319878/afba041bf46e/ai_v3i1e50800_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/035f/11319878/84ea0256abf7/ai_v3i1e50800_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/035f/11319878/adc3f97aba29/ai_v3i1e50800_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/035f/11319878/050610dc7f71/ai_v3i1e50800_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/035f/11319878/ece3384e73e7/ai_v3i1e50800_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/035f/11319878/afba041bf46e/ai_v3i1e50800_fig5.jpg

相似文献

1
Optimizing Clinical Trial Eligibility Design Using Natural Language Processing Models and Real-World Data: Algorithm Development and Validation.使用自然语言处理模型和真实世界数据优化临床试验资格设计:算法开发与验证
JMIR AI. 2024 Jul 29;3:e50800. doi: 10.2196/50800.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Piloting an automated clinical trial eligibility surveillance and provider alert system based on artificial intelligence and standard data models.基于人工智能和标准数据模型,试点自动化临床试验资格监测和提供方提醒系统。
BMC Med Res Methodol. 2023 Apr 11;23(1):88. doi: 10.1186/s12874-023-01916-6.
4
AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models.AutoCriteria:一个由大型语言模型驱动的可推广的临床试验纳入标准提取系统。
J Am Med Inform Assoc. 2024 Jan 18;31(2):375-385. doi: 10.1093/jamia/ocad218.
5
Automatic trial eligibility surveillance based on unstructured clinical data.基于非结构化临床数据的自动试验资格监测。
Int J Med Inform. 2019 Sep;129:13-19. doi: 10.1016/j.ijmedinf.2019.05.018. Epub 2019 May 23.
6
Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.设计一个基于 openEHR 的管道,使用自然语言处理提取和标准化非结构化临床数据。
Methods Inf Med. 2020 Dec;59(S 02):e64-e78. doi: 10.1055/s-0040-1716403. Epub 2020 Oct 14.
7
Detecting Ground Glass Opacity Features in Patients With Lung Cancer: Automated Extraction and Longitudinal Analysis via Deep Learning-Based Natural Language Processing.检测肺癌患者的磨玻璃影特征:基于深度学习的自然语言处理实现自动提取与纵向分析
JMIR AI. 2023 Jun 1;2:e44537. doi: 10.2196/44537.
8
Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department.自动化临床试验资格预筛查:提高急诊科临床试验患者识别效率
J Am Med Inform Assoc. 2015 Jan;22(1):166-78. doi: 10.1136/amiajnl-2014-002887. Epub 2014 Jul 16.
9
Mining Clinical Notes for Physical Rehabilitation Exercise Information: Natural Language Processing Algorithm Development and Validation Study.挖掘临床记录中的物理康复锻炼信息:自然语言处理算法的开发与验证研究
JMIR Med Inform. 2024 Apr 3;12:e52289. doi: 10.2196/52289.
10
A Real-Time Automated Patient Screening System for Clinical Trials Eligibility in an Emergency Department: Design and Evaluation.一种用于急诊科临床试验资格筛选的实时自动患者筛查系统:设计与评估
JMIR Med Inform. 2019 Jul 24;7(3):e14185. doi: 10.2196/14185.

引用本文的文献

1
Improving Participant Recruitment in Clinical Trials: Comparative Analysis of Innovative Digital Platforms.改善临床试验中的受试者招募:创新数字平台的比较分析
J Med Internet Res. 2024 Dec 18;26:e60504. doi: 10.2196/60504.
2
CriteriaMapper: establishing the automatic identification of clinical trial cohorts from electronic health records by matching normalized eligibility criteria and patient clinical characteristics.CriteriaMapper:通过匹配规范化的入选标准和患者临床特征,实现从电子健康记录中自动识别临床试验队列。
Sci Rep. 2024 Oct 25;14(1):25387. doi: 10.1038/s41598-024-77447-x.

本文引用的文献

1
Detecting Ground Glass Opacity Features in Patients With Lung Cancer: Automated Extraction and Longitudinal Analysis via Deep Learning-Based Natural Language Processing.检测肺癌患者的磨玻璃影特征:基于深度学习的自然语言处理实现自动提取与纵向分析
JMIR AI. 2023 Jun 1;2:e44537. doi: 10.2196/44537.
2
A data-driven approach to optimizing clinical study eligibility criteria.基于数据的临床研究纳入标准优化方法。
J Biomed Inform. 2023 Jun;142:104375. doi: 10.1016/j.jbi.2023.104375. Epub 2023 May 2.
3
A review of research on eligibility criteria for clinical trials.
临床试验入选标准研究述评。
Clin Exp Med. 2023 Oct;23(6):1867-1879. doi: 10.1007/s10238-022-00975-1. Epub 2023 Jan 5.
4
The Leaf Clinical Trials Corpus: a new resource for query generation from clinical trial eligibility criteria.《叶片临床试验语料库》:一个从临床试验资格标准中生成查询的新资源。
Sci Data. 2022 Aug 11;9(1):490. doi: 10.1038/s41597-022-01521-0.
5
Developing a synthetic control group using electronic health records: Application to a single-arm lifestyle intervention study.利用电子健康记录建立合成对照组:在单臂生活方式干预研究中的应用。
Prev Med Rep. 2021 Oct 4;24:101572. doi: 10.1016/j.pmedr.2021.101572. eCollection 2021 Dec.
6
Attention-based bidirectional long short-term memory networks for extracting temporal relationships from clinical discharge summaries.基于注意力的双向长短时记忆网络,用于从临床出院小结中提取时间关系。
J Biomed Inform. 2021 Nov;123:103915. doi: 10.1016/j.jbi.2021.103915. Epub 2021 Sep 29.
7
Transformer-Based Named Entity Recognition for Parsing Clinical Trial Eligibility Criteria.基于Transformer的用于解析临床试验资格标准的命名实体识别
ACM BCB. 2021 Aug;2021. doi: 10.1145/3459930.3469560.
8
Evaluating eligibility criteria of oncology trials using real-world data and AI.利用真实世界数据和人工智能评估肿瘤学试验的入组标准。
Nature. 2021 Apr;592(7855):629-633. doi: 10.1038/s41586-021-03430-5. Epub 2021 Apr 7.
9
A knowledge base of clinical trial eligibility criteria.临床试验入选标准知识库。
J Biomed Inform. 2021 May;117:103771. doi: 10.1016/j.jbi.2021.103771. Epub 2021 Apr 1.
10
Modernizing Clinical Trial Eligibility Criteria: Recommendations of the ASCO-Friends of Cancer Research Prior Therapies Work Group.临床研究纳入标准的现代化:ASCO-癌症研究之友先行治疗工作组的建议。
Clin Cancer Res. 2021 May 1;27(9):2408-2415. doi: 10.1158/1078-0432.CCR-20-3854. Epub 2021 Feb 9.