Suppr超能文献

使用GPT4从腰椎MRI放射学报告中提取信息:准确性及与研究级综合评分的基准对比

Information Extraction from Lumbar Spine MRI Radiology Reports Using GPT4: Accuracy and Benchmarking Against Research-Grade Comprehensive Scoring.

作者信息

Ziegeler Katharina, Kreutzinger Virginie, Tong Michelle W, Chin Cynthia T, Bahroos Emma, Wu Po-Hung, Bonnheim Noah, Fields Aaron J, Lotz Jeffrey C, Link Thomas M, Majumdar Sharmila

机构信息

Department of Radiology and Biomedical Imaging, University of California San Francisco, San Francisco, CA 94143, USA.

Department of Bioengineering, University of California Berkeley, Berkeley, CA 94720, USA.

出版信息

Diagnostics (Basel). 2025 Apr 4;15(7):930. doi: 10.3390/diagnostics15070930.

Abstract

: This study aimed to create a pipeline for standardized data extraction from lumbar-spine MRI radiology reports using a large language model (LLM) and assess the agreement of the extracted data with research-grade semi-quantitative scoring. : We included a subset of data from a multi-site NIH-funded cohort study of chronic low back pain (cLBP) participants. After initial prompt development, a secure application programming interface (API) deployment of OpenAIs GPT-4 was used to extract different classes of pathology from the clinical radiology report. Unsupervised UMAP and agglomerative clustering of the pathology terms' embeddings provided insight into model comprehension for optimized prompt design. Model extraction was benchmarked against human extraction (gold standard) with F1 scores and false-positive and false-negative rates (FPR/FNR). Then, an expert MSK radiologist provided comprehensive research-grade scores of the images, and agreement with report-extracted data was calculated using Cohen's kappa. : Data from 230 patients with cLBP were included (mean age 53.2 years, 54% women). The overall model performance for extracting data from clinical reports was excellent, with a mean F1 score of 0.96 across pathologies. The mean FPR was marginally higher than the FNR (5.1% vs. 3.0%). Agreement with comprehensive scoring was moderate (kappa 0.424), and the underreporting of lateral recess stenosis (FNR 63.6%) and overreporting of disc pathology (FPR 42.7%) were noted. : LLMs can accurately extract highly detailed information on lumbar spine imaging pathologies from radiology reports. Moderate agreement between the LLM and comprehensive scores underscores the need for less subjective, machine-based data extraction from imaging.

摘要

本研究旨在创建一个使用大语言模型(LLM)从腰椎MRI放射学报告中进行标准化数据提取的流程,并评估提取数据与研究级半定量评分的一致性。我们纳入了来自美国国立卫生研究院资助的多中心慢性下腰痛(cLBP)参与者队列研究的一部分数据。在初步提示开发之后,使用OpenAI的GPT-4的安全应用程序编程接口(API)部署从临床放射学报告中提取不同类别的病理信息。对病理学术语嵌入进行无监督的UMAP和凝聚聚类,为优化提示设计提供了对模型理解的洞察。通过F1分数以及假阳性和假阴性率(FPR/FNR),将模型提取结果与人工提取(金标准)进行基准对比。然后,一位肌肉骨骼(MSK)放射学专家对图像提供了全面的研究级评分,并使用科恩kappa系数计算与报告提取数据的一致性。纳入了230例cLBP患者的数据(平均年龄53.2岁,54%为女性)。从临床报告中提取数据的总体模型性能优异,各病理类型的平均F1分数为0.96。平均FPR略高于FNR(5.1%对3.0%)。与综合评分的一致性为中等(kappa 0.424),注意到侧隐窝狭窄报告不足(FNR 63.6%)和椎间盘病变报告过度(FPR 42.7%)。大语言模型可以从放射学报告中准确提取关于腰椎成像病理的高度详细信息。大语言模型与综合评分之间的中等一致性强调了需要从成像中进行主观性更低的基于机器的数据提取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89ee/11989208/0cf30d00f675/diagnostics-15-00930-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验