• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

专有大型语言模型在标记产科事件报告中的准确性。

Accuracy of a Proprietary Large Language Model in Labeling Obstetric Incident Reports.

出版信息

Jt Comm J Qual Patient Saf. 2024 Dec;50(12):877-881. doi: 10.1016/j.jcjq.2024.08.001. Epub 2024 Aug 6.

DOI:10.1016/j.jcjq.2024.08.001
PMID:39256071
Abstract

BACKGROUND

Using the data collected through incident reporting systems is challenging, as it is a large volume of primarily qualitative information. Large language models (LLMs), such as ChatGPT, provide novel capabilities in text summarization and labeling that could support safety data trending and early identification of opportunities to prevent patient harm. This study assessed the capability of a proprietary LLM (GPT-3.5) to automatically label a cross-sectional sample of real-world obstetric incident reports.

METHODS

A sample of 370 incident reports submitted to inpatient obstetric units between December 2022 and May 2023 was extracted. Human-annotated labels were assigned by a clinician reviewer and considered gold standard. The LLM was prompted to label incident reports relying solely on its pretrained knowledge and information included in the prompt. Primary outcomes assessed were sensitivity, specificity, positive predictive value, and negative predictive value. A secondary outcome assessed the human-perceived quality of the model's justification for the label(s) applied.

RESULTS

The LLM demonstrated the ability to label incident reports with high sensitivity and specificity. The model applied a total of 79 labels compared to the reviewer's 49 labels. Overall sensitivity for the model was 85.7%, and specificity was 97.9%. Positive and negative predictive values were 53.2% and 99.6%, respectively. For 60.8% of labels, the reviewer approved of the model's justification for applying the label.

CONCLUSION

The proprietary LLM demonstrated the ability to label obstetric incident reports with high sensitivity and specificity. LLMs offer the potential to enable more efficient use of data from incident reporting systems.

摘要

背景

使用事件报告系统收集的数据具有挑战性,因为它是大量主要为定性信息。大型语言模型(LLM),如 ChatGPT,在文本总结和标记方面提供了新颖的功能,这可能支持安全数据趋势分析和早期识别预防患者伤害的机会。本研究评估了专用 LLM(GPT-3.5)自动标记横断面真实世界产科事件报告的能力。

方法

从 2022 年 12 月至 2023 年 5 月期间提交给住院产科病房的 370 份事件报告中提取了一个样本。由临床审查员分配人工注释标签,并被视为金标准。提示 LLM 仅依靠其预先训练的知识和提示中包含的信息来标记事件报告。主要评估指标为敏感性、特异性、阳性预测值和阴性预测值。次要评估指标为模型对应用标签的解释的人类感知质量。

结果

LLM 表现出标记事件报告的高敏感性和特异性的能力。模型总共应用了 79 个标签,而审查员应用了 49 个标签。模型的总体敏感性为 85.7%,特异性为 97.9%。阳性预测值和阴性预测值分别为 53.2%和 99.6%。对于 60.8%的标签,审查员认可模型应用标签的理由。

结论

专用 LLM 表现出标记产科事件报告的高敏感性和特异性的能力。LLM 有可能使事件报告系统的数据更有效地利用。

相似文献

1
Accuracy of a Proprietary Large Language Model in Labeling Obstetric Incident Reports.专有大型语言模型在标记产科事件报告中的准确性。
Jt Comm J Qual Patient Saf. 2024 Dec;50(12):877-881. doi: 10.1016/j.jcjq.2024.08.001. Epub 2024 Aug 6.
2
Open-source Large Language Models can Generate Labels from Radiology Reports for Training Convolutional Neural Networks.开源大语言模型可从放射学报告生成标签以训练卷积神经网络。
Acad Radiol. 2025 May;32(5):2402-2410. doi: 10.1016/j.acra.2024.12.028. Epub 2025 Jan 6.
3
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
4
Use of ChatGPT Large Language Models to Extract Details of Recommendations for Additional Imaging From Free-Text Impressions of Radiology Reports.使用ChatGPT大型语言模型从放射学报告的自由文本印象中提取额外影像学检查建议的详细信息。
AJR Am J Roentgenol. 2025 Apr;224(4):e2432341. doi: 10.2214/AJR.24.32341. Epub 2025 Jan 29.
5
Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study.评估大语言模型在根据阅读水平生成皮肤科患者教育材料方面的应用:定性研究。
JMIR Dermatol. 2024 May 16;7:e55898. doi: 10.2196/55898.
6
Automated Radiology Report Labeling in Chest X-Ray Pathologies: Development and Evaluation of a Large Language Model Framework.胸部X光病理学中的自动放射学报告标注:大语言模型框架的开发与评估
JMIR Med Inform. 2025 Mar 28;13:e68618. doi: 10.2196/68618.
7
The Accuracy and Capability of Artificial Intelligence Solutions in Health Care Examinations and Certificates: Systematic Review and Meta-Analysis.人工智能解决方案在医疗检查和证书中的准确性和能力:系统评价和荟萃分析。
J Med Internet Res. 2024 Nov 5;26:e56532. doi: 10.2196/56532.
8
Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.用于肿瘤学健康信息提取的大语言模型应用:范围综述
JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.
9
Construction of a Multi-Label Classifier for Extracting Multiple Incident Factors From Medication Incident Reports in Residential Care Facilities: Natural Language Processing Approach.构建用于从养老机构用药事件报告中提取多个事件因素的多标签分类器:自然语言处理方法
JMIR Med Inform. 2024 Jul 23;12:e58141. doi: 10.2196/58141.
10
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

引用本文的文献

1
Natural Language Processing: Set to Transform Pediatric Research.自然语言处理:即将改变儿科研究。
Hosp Pediatr. 2025 Jan 1;15(1):e12-e14. doi: 10.1542/hpeds.2024-008115.