• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用GPT模型从临床记录中提取社会决定因素和家族病史的最少指令零样本学习

Zero-shot Learning with Minimum Instruction to Extract Social Determinants and Family History from Clinical Notes using GPT Model.

作者信息

Bhate Neel Jitesh, Mittal Ansh, He Zhe, Luo Xiao

机构信息

Department of CIS, IUPUI, Indianapolis, IN, USA.

School of Information, Florida State University, Tallahassee, FL, USA.

出版信息

Proc IEEE Int Conf Big Data. 2023 Dec;2023:1476-1480. doi: 10.1109/BigData59044.2023.10386811.

DOI:10.1109/BigData59044.2023.10386811
PMID:39101057
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11295958/
Abstract

Demographics, social determinants of health, and family history documented in the unstructured text within the electronic health records are increasingly being studied to understand how this information can be utilized with the structured data to improve healthcare outcomes. After the GPT models were released, many studies have applied GPT models to extract this information from the narrative clinical notes. Different from the existing work, our research focuses on investigating the zero-shot learning on extracting this information together by providing minimum information to the GPT model. We utilize de-identified real-world clinical notes annotated for demographics, various social determinants, and family history information. Given that the GPT model might provide text different from the text in the original data, we explore two sets of evaluation metrics, including the traditional NER evaluation metrics and semantic similarity evaluation metrics, to completely understand the performance. Our results show that the GPT-3.5 method achieved an average of 0.975 F1 on demographics extraction, 0.615 F1 on social determinants extraction, and 0.722 F1 on family history extraction. We believe these results can be further improved through model fine-tuning or few-shots learning. Through the case studies, we also identified the limitations of the GPT models, which need to be addressed in future research.

摘要

电子健康记录中非结构化文本中记录的人口统计学、健康的社会决定因素和家族史正越来越多地被研究,以了解如何将这些信息与结构化数据结合使用,以改善医疗保健结果。GPT模型发布后,许多研究已应用GPT模型从叙述性临床记录中提取此信息。与现有工作不同,我们的研究重点是通过向GPT模型提供最少信息来研究一起提取此信息的零样本学习。我们使用针对人口统计学、各种社会决定因素和家族史信息进行注释的去识别真实世界临床记录。鉴于GPT模型可能提供与原始数据中的文本不同的文本,我们探索了两组评估指标,包括传统的命名实体识别评估指标和语义相似性评估指标,以全面了解性能。我们的结果表明,GPT-3.5方法在人口统计学提取方面的F1平均为0.975,在社会决定因素提取方面的F1为0.615,在家族史提取方面的F1为0.722。我们相信,通过模型微调或少样本学习,这些结果可以进一步提高。通过案例研究,我们还确定了GPT模型的局限性,这些局限性需要在未来的研究中加以解决。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d66/11295958/f5917ecc4add/nihms-2013006-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d66/11295958/f5917ecc4add/nihms-2013006-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d66/11295958/f5917ecc4add/nihms-2013006-f0001.jpg

相似文献

1
Zero-shot Learning with Minimum Instruction to Extract Social Determinants and Family History from Clinical Notes using GPT Model.使用GPT模型从临床记录中提取社会决定因素和家族病史的最少指令零样本学习
Proc IEEE Int Conf Big Data. 2023 Dec;2023:1476-1480. doi: 10.1109/BigData59044.2023.10386811.
2
A large language model-based generative natural language processing framework fine-tuned on clinical notes accurately extracts headache frequency from electronic health records.基于大型语言模型的生成式自然语言处理框架,在临床笔记上进行了微调,能够从电子健康记录中准确提取头痛频率。
Headache. 2024 Apr;64(4):400-409. doi: 10.1111/head.14702. Epub 2024 Mar 25.
3
A Large Language Model-Based Generative Natural Language Processing Framework Finetuned on Clinical Notes Accurately Extracts Headache Frequency from Electronic Health Records.一种基于大语言模型的生成式自然语言处理框架,在临床笔记上进行微调后,能准确从电子健康记录中提取头痛频率。
medRxiv. 2023 Oct 3:2023.10.02.23296403. doi: 10.1101/2023.10.02.23296403.
4
Large-scale identification of social and behavioral determinants of health from clinical notes: comparison of Latent Semantic Indexing and Generative Pretrained Transformer (GPT) models.从临床记录中大规模识别健康的社会和行为决定因素:潜在语义索引和生成式预训练转换器 (GPT) 模型的比较。
BMC Med Inform Decis Mak. 2024 Oct 10;24(1):296. doi: 10.1186/s12911-024-02705-x.
5
Using Large Language Models to Annotate Complex Cases of Social Determinants of Health in Longitudinal Clinical Records.使用大语言模型注释纵向临床记录中健康社会决定因素的复杂病例。
medRxiv. 2024 Apr 27:2024.04.25.24306380. doi: 10.1101/2024.04.25.24306380.
6
Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。
J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.
7
Extraction of Substance Use Information From Clinical Notes: Generative Pretrained Transformer-Based Investigation.从临床记录中提取物质使用信息:基于生成式预训练变换器的研究
JMIR Med Inform. 2024 Aug 19;12:e56243. doi: 10.2196/56243.
8
Evaluating GPT-V4 (GPT-4 with Vision) on Detection of Radiologic Findings on Chest Radiographs.评估 GPT-V4(具有视觉功能的 GPT-4)在检测胸部 X 光片中放射学发现的能力。
Radiology. 2024 May;311(2):e233270. doi: 10.1148/radiol.233270.
9
Applying generative AI with retrieval augmented generation to summarize and extract key clinical information from electronic health records.运用生成式人工智能与检索增强生成相结合,从电子健康记录中总结和提取关键临床信息。
J Biomed Inform. 2024 Aug;156:104662. doi: 10.1016/j.jbi.2024.104662. Epub 2024 Jun 14.
10
Few-Shot Learning for Clinical Natural Language Processing Using Siamese Neural Networks: Algorithm Development and Validation Study.使用暹罗神经网络的临床自然语言处理少样本学习:算法开发与验证研究
JMIR AI. 2023 May 4;2:e44293. doi: 10.2196/44293.

引用本文的文献

1
Unveiling social determinants of health impact on adverse pregnancy outcomes through natural language processing.通过自然语言处理揭示健康的社会决定因素对不良妊娠结局的影响。
Sci Rep. 2025 Aug 9;15(1):29183. doi: 10.1038/s41598-025-13542-x.
2
Adapting Generative Large Language Models for Information Extraction from Unstructured Electronic Health Records in Residential Aged Care: A Comparative Analysis of Training Approaches.使生成式大语言模型适用于从老年护理机构的非结构化电子健康记录中提取信息:训练方法的比较分析
J Healthc Inform Res. 2025 Feb 20;9(2):191-219. doi: 10.1007/s41666-025-00190-z. eCollection 2025 Jun.
3

本文引用的文献

1
Cardiovascular-Kidney-Metabolic Health: A Presidential Advisory From the American Heart Association.心血管-肾脏-代谢健康:美国心脏协会的总统顾问报告
Circulation. 2023 Nov 14;148(20):1606-1635. doi: 10.1161/CIR.0000000000001184. Epub 2023 Oct 9.
2
Extracting social determinants of health from clinical note text with classification and sequence-to-sequence approaches.使用分类和序列到序列方法从临床记录文本中提取健康的社会决定因素。
J Am Med Inform Assoc. 2023 Jul 19;30(8):1448-1455. doi: 10.1093/jamia/ocad071.
3
Medical image captioning via generative pretrained transformers.
Decoding substance use disorder severity from clinical notes using a large language model.
使用大语言模型从临床记录中解码物质使用障碍的严重程度
Npj Ment Health Res. 2025 Feb 7;4(1):5. doi: 10.1038/s44184-024-00114-6.
4
Bias in medical AI: Implications for clinical decision-making.医学人工智能中的偏差:对临床决策的影响。
PLOS Digit Health. 2024 Nov 7;3(11):e0000651. doi: 10.1371/journal.pdig.0000651. eCollection 2024 Nov.
5
Large Language Models in Biomedical and Health Informatics: A Review with Bibliometric Analysis.生物医学与健康信息学中的大语言模型:文献计量分析综述
J Healthc Inform Res. 2024 Sep 14;8(4):658-711. doi: 10.1007/s41666-024-00171-8. eCollection 2024 Dec.
基于生成式预训练转换器的医学影像字幕生成。
Sci Rep. 2023 Mar 13;13(1):4171. doi: 10.1038/s41598-023-31223-5.
4
Improving child health through Big Data and data science.通过大数据和数据科学改善儿童健康。
Pediatr Res. 2023 Jan;93(2):342-349. doi: 10.1038/s41390-022-02264-9. Epub 2022 Aug 16.
5
Extracting social determinants of health from electronic health records using natural language processing: a systematic review.利用自然语言处理从电子健康记录中提取健康的社会决定因素:系统评价。
J Am Med Inform Assoc. 2021 Nov 25;28(12):2716-2727. doi: 10.1093/jamia/ocab170.
6
Extracting Family History of Patients From Clinical Narratives: Exploring an End-to-End Solution With Deep Learning Models.从临床叙述中提取患者家族病史:使用深度学习模型探索端到端解决方案
JMIR Med Inform. 2020 Dec 15;8(12):e22982. doi: 10.2196/22982.
7
Annotating social determinants of health using active learning, and characterizing determinants using neural event extraction.使用主动学习对健康的社会决定因素进行标注,并使用神经事件提取对决定因素进行特征描述。
J Biomed Inform. 2021 Jan;113:103631. doi: 10.1016/j.jbi.2020.103631. Epub 2020 Dec 5.
8
Automated extraction of family history information from clinical notes.从临床记录中自动提取家族病史信息。
AMIA Annu Symp Proc. 2014 Nov 14;2014:1709-17. eCollection 2014.
9
Various criteria in the evaluation of biomedical named entity recognition.生物医学命名实体识别评估中的各种标准。
BMC Bioinformatics. 2006 Feb 24;7:92. doi: 10.1186/1471-2105-7-92.
10
A probabilistic model for identifying protein names and their name boundaries.一种用于识别蛋白质名称及其名称边界的概率模型。
Proc IEEE Comput Soc Bioinform Conf. 2003;2:251-8.