Suppr超能文献

使用自然语言处理模型和真实世界数据优化临床试验资格设计:算法开发与验证

Optimizing Clinical Trial Eligibility Design Using Natural Language Processing Models and Real-World Data: Algorithm Development and Validation.

作者信息

Lee Kyeryoung, Liu Zongzhi, Mai Yun, Jun Tomi, Ma Meng, Wang Tongyu, Ai Lei, Calay Ediz, Oh William, Stolovitzky Gustavo, Schadt Eric, Wang Xiaoyan

机构信息

GendDx (Sema4), Stamford, CT, United States.

Icahn School of Medicine at Mount Sinai, New York, NY, United States.

出版信息

JMIR AI. 2024 Jul 29;3:e50800. doi: 10.2196/50800.

Abstract

BACKGROUND

Clinical trials are vital for developing new therapies but can also delay drug development. Efficient trial data management, optimized trial protocol, and accurate patient identification are critical for reducing trial timelines. Natural language processing (NLP) has the potential to achieve these objectives.

OBJECTIVE

This study aims to assess the feasibility of using data-driven approaches to optimize clinical trial protocol design and identify eligible patients. This involves creating a comprehensive eligibility criteria knowledge base integrated within electronic health records using deep learning-based NLP techniques.

METHODS

We obtained data of 3281 industry-sponsored phase 2 or 3 interventional clinical trials recruiting patients with non-small cell lung cancer, prostate cancer, breast cancer, multiple myeloma, ulcerative colitis, and Crohn disease from ClinicalTrials.gov, spanning the period between 2013 and 2020. A customized bidirectional long short-term memory- and conditional random field-based NLP pipeline was used to extract all eligibility criteria attributes and convert hypernym concepts into computable hyponyms along with their corresponding values. To illustrate the simulation of clinical trial design for optimization purposes, we selected a subset of patients with non-small cell lung cancer (n=2775), curated from the Mount Sinai Health System, as a pilot study.

RESULTS

We manually annotated the clinical trial eligibility corpus (485/3281, 14.78% trials) and constructed an eligibility criteria-specific ontology. Our customized NLP pipeline, developed based on the eligibility criteria-specific ontology that we created through manual annotation, achieved high precision (0.91, range 0.67-1.00) and recall (0.79, range 0.50-1) scores, as well as a high F-score (0.83, range 0.67-1), enabling the efficient extraction of granular criteria entities and relevant attributes from 3281 clinical trials. A standardized eligibility criteria knowledge base, compatible with electronic health records, was developed by transforming hypernym concepts into machine-interpretable hyponyms along with their corresponding values. In addition, an interface prototype demonstrated the practicality of leveraging real-world data for optimizing clinical trial protocols and identifying eligible patients.

CONCLUSIONS

Our customized NLP pipeline successfully generated a standardized eligibility criteria knowledge base by transforming hypernym criteria into machine-readable hyponyms along with their corresponding values. A prototype interface integrating real-world patient information allows us to assess the impact of each eligibility criterion on the number of patients eligible for the trial. Leveraging NLP and real-world data in a data-driven approach holds promise for streamlining the overall clinical trial process, optimizing processes, and improving efficiency in patient identification.

摘要

背景

临床试验对于开发新疗法至关重要,但也可能会延迟药物研发。高效的试验数据管理、优化的试验方案以及准确的患者识别对于缩短试验时间线至关重要。自然语言处理(NLP)有潜力实现这些目标。

目的

本研究旨在评估使用数据驱动方法优化临床试验方案设计并识别合格患者的可行性。这涉及使用基于深度学习的NLP技术创建一个集成在电子健康记录中的综合入选标准知识库。

方法

我们从ClinicalTrials.gov获取了2013年至2020年期间3281项由行业赞助的2期或3期介入性临床试验的数据,这些试验招募非小细胞肺癌、前列腺癌、乳腺癌、多发性骨髓瘤、溃疡性结肠炎和克罗恩病患者。使用定制的基于双向长短期记忆和条件随机场的NLP管道来提取所有入选标准属性,并将上位概念转换为可计算的下位概念及其相应值。为了说明用于优化目的的临床试验设计模拟,我们从西奈山医疗系统挑选了一组非小细胞肺癌患者(n = 2775)作为试点研究。

结果

我们对手动注释的临床试验入选语料库(485/3281,14.78%的试验)进行了构建,并构建了一个特定于入选标准的本体。我们基于通过手动注释创建的特定于入选标准的本体开发的定制NLP管道,实现了高精度(0.91,范围0.67 - 1.00)和召回率(0.79,范围0.50 - 1)分数,以及高F分数(0.83,范围0.67 - 1),能够从3281项临床试验中高效提取详细的标准实体和相关属性。通过将上位概念转换为机器可解释的下位概念及其相应值,开发了一个与电子健康记录兼容的标准化入选标准知识库。此外,一个接口原型展示了利用真实世界数据优化临床试验方案和识别合格患者的实用性。

结论

我们定制的NLP管道通过将上位标准转换为机器可读的下位概念及其相应值,成功生成了一个标准化的入选标准知识库。一个集成真实世界患者信息的原型接口使我们能够评估每个入选标准对符合试验条件患者数量的影响。以数据驱动的方式利用NLP和真实世界数据有望简化整个临床试验过程、优化流程并提高患者识别效率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/035f/11319878/84ea0256abf7/ai_v3i1e50800_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验