• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在提取炎症性肠病患者报告的结局方面,大型语言模型优于传统的自然语言处理方法。

Large language models outperform traditional natural language processing methods in extracting patient-reported outcomes in IBD.

作者信息

Patel Perseus V, Davis Conner, Ralbovsky Amariel, Tinoco Daniel, Williams Christopher Y K, Slatter Shadera, Naderalvojoud Behzad, Rosen Michael J, Hernandez-Boussard Tina, Rudrapatna Vivek

机构信息

Department of Pediatrics, University of California San Francisco, San Francisco, CA.

Division of Pediatric Gastroenterology, Stanford University School of Medicine, Palo Alto, CA.

出版信息

medRxiv. 2024 Sep 6:2024.09.05.24313139. doi: 10.1101/2024.09.05.24313139.

DOI:10.1101/2024.09.05.24313139
PMID:39281744
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11398594/
Abstract

BACKGROUND AND AIMS

Patient-reported outcomes (PROs) are vital in assessing disease activity and treatment outcomes in inflammatory bowel disease (IBD). However, manual extraction of these PROs from the free-text of clinical notes is burdensome. We aimed to improve data curation from free-text information in the electronic health record, making it more available for research and quality improvement. This study aimed to compare traditional natural language processing (tNLP) and large language models (LLMs) in extracting three IBD PROs (abdominal pain, diarrhea, fecal blood) from clinical notes across two institutions.

METHODS

Clinic notes were annotated for each PRO using preset protocols. Models were developed and internally tested at the University of California San Francisco (UCSF), and then externally validated at Stanford University. We compared tNLP and LLM-based models on accuracy, sensitivity, specificity, positive and negative predictive value. Additionally, we conducted fairness and error assessments.

RESULTS

Inter-rater reliability between annotators was >90%. On the UCSF test set (n=50), the top-performing tNLP models showcased accuracies of 92% (abdominal pain), 82% (diarrhea) and 80% (fecal blood), comparable to GPT-4, which was 96%, 88%, and 90% accurate, respectively. On external validation at Stanford (n=250), tNLP models failed to generalize (61-62% accuracy) while GPT-4 maintained accuracies >90%. PaLM-2 and GPT-4 showed similar performance. No biases were detected based on demographics or diagnosis.

CONCLUSIONS

LLMs are accurate and generalizable methods for extracting PROs. They maintain excellent accuracy across institutions, despite heterogeneity in note templates and authors. Widespread adoption of such tools has the potential to enhance IBD research and patient care.

摘要

背景与目的

患者报告结局(PROs)对于评估炎症性肠病(IBD)的疾病活动度和治疗结局至关重要。然而,从临床记录的自由文本中手动提取这些PROs非常繁琐。我们旨在改进从电子健康记录中的自由文本信息进行数据整理,使其更便于用于研究和质量改进。本研究旨在比较传统自然语言处理(tNLP)和大语言模型(LLMs)在从两个机构的临床记录中提取三种IBD PROs(腹痛、腹泻、便血)方面的效果。

方法

使用预设方案对每种PRO的临床记录进行注释。模型在加利福尼亚大学旧金山分校(UCSF)开发并进行内部测试,然后在斯坦福大学进行外部验证。我们比较了基于tNLP和LLM的模型在准确性、敏感性、特异性、阳性和阴性预测值方面的表现。此外,我们还进行了公平性和误差评估。

结果

注释者之间的评分者间信度>90%。在UCSF测试集(n = 50)上,表现最佳的tNLP模型在腹痛、腹泻和便血方面的准确率分别为92%、82%和80%,与GPT-4相当,后者的准确率分别为96%、88%和90%。在斯坦福大学的外部验证(n = 250)中,tNLP模型未能泛化(准确率为61 - 62%),而GPT-4的准确率保持在>90%。PaLM-2和GPT-4表现相似。未检测到基于人口统计学或诊断的偏差。

结论

大语言模型是提取PROs的准确且可泛化的方法。尽管记录模板和作者存在异质性,但它们在各机构中均保持出色的准确性。广泛采用此类工具有可能加强IBD研究和患者护理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50ae/11398594/36c9da68f54d/nihpp-2024.09.05.24313139v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50ae/11398594/68244ffc22db/nihpp-2024.09.05.24313139v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50ae/11398594/36c9da68f54d/nihpp-2024.09.05.24313139v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50ae/11398594/68244ffc22db/nihpp-2024.09.05.24313139v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50ae/11398594/36c9da68f54d/nihpp-2024.09.05.24313139v1-f0002.jpg

相似文献

1
Large language models outperform traditional natural language processing methods in extracting patient-reported outcomes in IBD.在提取炎症性肠病患者报告的结局方面,大型语言模型优于传统的自然语言处理方法。
medRxiv. 2024 Sep 6:2024.09.05.24313139. doi: 10.1101/2024.09.05.24313139.
2
Large Language Models Outperform Traditional Natural Language Processing Methods in Extracting Patient-Reported Outcomes in Inflammatory Bowel Disease.在提取炎症性肠病患者报告结局方面,大语言模型优于传统自然语言处理方法。
Gastro Hep Adv. 2024 Oct 10;4(2):100563. doi: 10.1016/j.gastha.2024.10.003. eCollection 2025.
3
Algorithmic Identification of Treatment-Emergent Adverse Events From Clinical Notes Using Large Language Models: A Pilot Study in Inflammatory Bowel Disease.利用大型语言模型从临床记录中算法识别治疗相关不良事件:炎症性肠病的初步研究。
Clin Pharmacol Ther. 2024 Jun;115(6):1391-1399. doi: 10.1002/cpt.3226. Epub 2024 Mar 8.
4
Large Language Model-Based Assessment of Clinical Reasoning Documentation in the Electronic Health Record Across Two Institutions: Development and Validation Study.基于大语言模型对两个机构电子健康记录中临床推理文档的评估:开发与验证研究
J Med Internet Res. 2025 Mar 21;27:e67967. doi: 10.2196/67967.
5
Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports.在从自由文本报告中提取胸部X光检查结果方面,确保隐私的开放权重大型语言模型与封闭权重的GPT-4o具有竞争力。
Radiology. 2025 Jan;314(1):e240895. doi: 10.1148/radiol.240895.
6
Using Large Language Models to Annotate Complex Cases of Social Determinants of Health in Longitudinal Clinical Records.使用大语言模型注释纵向临床记录中健康社会决定因素的复杂病例。
medRxiv. 2024 Apr 27:2024.04.25.24306380. doi: 10.1101/2024.04.25.24306380.
7
Automated extraction of functional biomarkers of verbal and ambulatory ability from multi-institutional clinical notes using large language models.使用大语言模型从多机构临床记录中自动提取言语和行动能力的功能生物标志物。
J Neurodev Disord. 2025 Apr 30;17(1):24. doi: 10.1186/s11689-025-09612-w.
8
Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study.使用GPT-4o从放射学诊断印象中提取肺栓塞诊断:大语言模型评估研究
JMIR Med Inform. 2025 Apr 9;13:e67706. doi: 10.2196/67706.
9
Using Synthetic Health Care Data to Leverage Large Language Models for Named Entity Recognition: Development and Validation Study.利用合成医疗保健数据借助大语言模型进行命名实体识别:开发与验证研究。
J Med Internet Res. 2025 Mar 18;27:e66279. doi: 10.2196/66279.
10
Use of ChatGPT Large Language Models to Extract Details of Recommendations for Additional Imaging From Free-Text Impressions of Radiology Reports.使用ChatGPT大型语言模型从放射学报告的自由文本印象中提取额外影像学检查建议的详细信息。
AJR Am J Roentgenol. 2025 Apr;224(4):e2432341. doi: 10.2214/AJR.24.32341. Epub 2025 Jan 29.

本文引用的文献

1
Algorithmic Identification of Treatment-Emergent Adverse Events From Clinical Notes Using Large Language Models: A Pilot Study in Inflammatory Bowel Disease.利用大型语言模型从临床记录中算法识别治疗相关不良事件:炎症性肠病的初步研究。
Clin Pharmacol Ther. 2024 Jun;115(6):1391-1399. doi: 10.1002/cpt.3226. Epub 2024 Mar 8.
2
Large language models to identify social determinants of health in electronic health records.利用大语言模型识别电子健康记录中的健康社会决定因素。
NPJ Digit Med. 2024 Jan 11;7(1):6. doi: 10.1038/s41746-023-00970-0.
3
A Comparison of a Large Language Model vs Manual Chart Review for the Extraction of Data Elements From the Electronic Health Record.
大型语言模型与人工病历审查在从电子健康记录中提取数据元素方面的比较
Gastroenterology. 2024 Apr;166(4):707-709.e3. doi: 10.1053/j.gastro.2023.12.019. Epub 2023 Dec 25.
4
Automating Access to Real-World Evidence.实现真实世界证据获取的自动化。
JTO Clin Res Rep. 2022 May 17;3(6):100340. doi: 10.1016/j.jtocrr.2022.100340. eCollection 2022 Jun.
5
Identifying the Presence, Activity, and Status of Extraintestinal Manifestations of Inflammatory Bowel Disease Using Natural Language Processing of Clinical Notes.利用临床记录的自然语言处理识别炎症性肠病肠外表现的存在、活动情况及状态
Inflamm Bowel Dis. 2023 Apr 3;29(4):503-510. doi: 10.1093/ibd/izac109.
6
Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python.医学 spaCy:Python 中的新型临床文本处理工具包,助力临床应用。
AMIA Annu Symp Proc. 2022 Feb 21;2021:438-447. eCollection 2021.
7
STRIDE-II: An Update on the Selecting Therapeutic Targets in Inflammatory Bowel Disease (STRIDE) Initiative of the International Organization for the Study of IBD (IOIBD): Determining Therapeutic Goals for Treat-to-Target strategies in IBD.STRIDE-II:炎症性肠病(STRIDE)国际研究组织(IOIBD)治疗靶点选择更新:确定炎症性肠病靶向治疗策略的治疗目标。
Gastroenterology. 2021 Apr;160(5):1570-1583. doi: 10.1053/j.gastro.2020.12.031. Epub 2021 Feb 19.
8
Differences in Biologic Utilization and Surgery Rates in Pediatric and Adult Crohn's Disease: Results From a Large Electronic Medical Record-derived Cohort.儿童和成人克罗恩病的生物学利用和手术率差异:来自大型电子病历队列的结果。
Inflamm Bowel Dis. 2021 Jun 15;27(7):1035-1044. doi: 10.1093/ibd/izaa239.
9
An Exploration Into the Use of a Chatbot for Patients With Inflammatory Bowel Diseases: Retrospective Cohort Study.炎症性肠病患者使用聊天机器人的探索:回顾性队列研究。
J Med Internet Res. 2020 May 26;22(5):e15589. doi: 10.2196/15589.
10
The Association Between Arthralgia and Vedolizumab Using Natural Language Processing.关节痛与维得利珠单抗的关联:自然语言处理的应用。
Inflamm Bowel Dis. 2018 Sep 15;24(10):2242-2246. doi: 10.1093/ibd/izy127.