Suppr超能文献

Criteria2Query 3.0:利用生成式大型语言模型生成临床试验资格查询。

Criteria2Query 3.0: Leveraging generative large language models for clinical trial eligibility query generation.

机构信息

Department of Biomedical Informatics, Columbia University, New York, United States.

Columbia University Vagelos College of Physicians and Surgeons, New York, United States.

出版信息

J Biomed Inform. 2024 Jun;154:104649. doi: 10.1016/j.jbi.2024.104649. Epub 2024 Apr 30.

Abstract

OBJECTIVE

Automated identification of eligible patients is a bottleneck of clinical research. We propose Criteria2Query (C2Q) 3.0, a system that leverages GPT-4 for the semi-automatic transformation of clinical trial eligibility criteria text into executable clinical database queries.

MATERIALS AND METHODS

C2Q 3.0 integrated three GPT-4 prompts for concept extraction, SQL query generation, and reasoning. Each prompt was designed and evaluated separately. The concept extraction prompt was benchmarked against manual annotations from 20 clinical trials by two evaluators, who later also measured SQL generation accuracy and identified errors in GPT-generated SQL queries from 5 clinical trials. The reasoning prompt was assessed by three evaluators on four metrics: readability, correctness, coherence, and usefulness, using corrected SQL queries and an open-ended feedback questionnaire.

RESULTS

Out of 518 concepts from 20 clinical trials, GPT-4 achieved an F1-score of 0.891 in concept extraction. For SQL generation, 29 errors spanning seven categories were detected, with logic errors being the most common (n = 10; 34.48 %). Reasoning evaluations yielded a high coherence rating, with the mean score being 4.70 but relatively lower readability, with a mean of 3.95. Mean scores of correctness and usefulness were identified as 3.97 and 4.37, respectively.

CONCLUSION

GPT-4 significantly improves the accuracy of extracting clinical trial eligibility criteria concepts in C2Q 3.0. Continued research is warranted to ensure the reliability of large language models.

摘要

目的

合格患者的自动识别是临床研究的一个瓶颈。我们提出了 Criteria2Query(C2Q)3.0,这是一个利用 GPT-4 将临床试验资格标准文本半自动转换为可执行的临床数据库查询的系统。

材料和方法

C2Q 3.0 集成了三个 GPT-4 提示,用于概念提取、SQL 查询生成和推理。每个提示都分别进行了设计和评估。概念提取提示通过两名评估员与 20 项临床试验的手动注释进行了基准测试,然后还评估了 GPT 生成的 SQL 查询的准确性,并从 5 项临床试验中识别了 GPT 生成的 SQL 查询中的错误。推理提示由三名评估员使用四个指标进行评估:可读性、正确性、连贯性和有用性,使用经过修正的 SQL 查询和一个开放的反馈问卷。

结果

在 20 项临床试验的 518 个概念中,GPT-4 在概念提取方面的 F1 得分为 0.891。对于 SQL 生成,检测到 29 个错误,涵盖了七个类别,其中逻辑错误最为常见(n=10;34.48%)。推理评估产生了较高的连贯性评分,平均得分为 4.70,但可读性相对较低,平均得分为 3.95。正确性和有用性的平均得分分别为 3.97 和 4.37。

结论

GPT-4 显著提高了 C2Q 3.0 中提取临床试验资格标准概念的准确性。需要进一步研究以确保大型语言模型的可靠性。

相似文献

2
Eliciting adverse effects data from participants in clinical trials.从临床试验参与者中获取不良反应数据。
Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.

引用本文的文献

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验