• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于有限数据表型分析的通用且可转移的患者语言表示

Generalized and transferable patient language representation for phenotyping with limited data.

作者信息

Si Yuqi, Bernstam Elmer V, Roberts Kirk

机构信息

School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA.

School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA; Division of General Internal Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, TX, USA.

出版信息

J Biomed Inform. 2021 Apr;116:103726. doi: 10.1016/j.jbi.2021.103726. Epub 2021 Mar 9.

DOI:10.1016/j.jbi.2021.103726
PMID:33711541
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11577729/
Abstract

The paradigm of representation learning through transfer learning has the potential to greatly enhance clinical natural language processing. In this work, we propose a multi-task pre-training and fine-tuning approach for learning generalized and transferable patient representations from medical language. The model is first pre-trained with different but related high-prevalence phenotypes and further fine-tuned on downstream target tasks. Our main contribution focuses on the impact this technique can have on low-prevalence phenotypes, a challenging task due to the dearth of data. We validate the representation from pre-training, and fine-tune the multi-task pre-trained models on low-prevalence phenotypes including 38 circulatory diseases, 23 respiratory diseases, and 17 genitourinary diseases. We find multi-task pre-training increases learning efficiency and achieves consistently high performance across the majority of phenotypes. Most important, the multi-task pre-training is almost always either the best-performing model or performs tolerably close to the best-performing model, a property we refer to as robust. All these results lead us to conclude that this multi-task transfer learning architecture is a robust approach for developing generalized and transferable patient language representations for numerous phenotypes.

摘要

通过迁移学习进行表示学习的范式有潜力极大地提升临床自然语言处理。在这项工作中,我们提出了一种多任务预训练和微调方法,用于从医学语言中学习通用且可迁移的患者表示。该模型首先使用不同但相关的高流行表型进行预训练,然后在下游目标任务上进行进一步微调。我们的主要贡献集中在这项技术对低流行表型可能产生的影响上,由于数据匮乏,这是一项具有挑战性的任务。我们验证了预训练得到的表示,并在包括38种循环系统疾病、23种呼吸系统疾病和17种泌尿生殖系统疾病在内的低流行表型上对多任务预训练模型进行微调。我们发现多任务预训练提高了学习效率,并在大多数表型上始终实现高性能。最重要的是,多任务预训练几乎总是要么是性能最佳的模型,要么与性能最佳的模型表现相当接近,我们将这种特性称为稳健性。所有这些结果使我们得出结论,这种多任务迁移学习架构是一种稳健的方法,可用于为众多表型开发通用且可迁移的患者语言表示。

相似文献

1
Generalized and transferable patient language representation for phenotyping with limited data.用于有限数据表型分析的通用且可转移的患者语言表示
J Biomed Inform. 2021 Apr;116:103726. doi: 10.1016/j.jbi.2021.103726. Epub 2021 Mar 9.
2
Med7: A transferable clinical natural language processing model for electronic health records.Med7:一种可转移的电子健康记录临床自然语言处理模型。
Artif Intell Med. 2021 Aug;118:102086. doi: 10.1016/j.artmed.2021.102086. Epub 2021 May 18.
3
Drug knowledge discovery via multi-task learning and pre-trained models.通过多任务学习和预训练模型进行药物知识发现。
BMC Med Inform Decis Mak. 2021 Nov 16;21(Suppl 9):251. doi: 10.1186/s12911-021-01614-7.
4
Comparing Pre-trained and Feature-Based Models for Prediction of Alzheimer's Disease Based on Speech.基于语音比较预训练模型和基于特征的模型对阿尔茨海默病的预测
Front Aging Neurosci. 2021 Apr 27;13:635945. doi: 10.3389/fnagi.2021.635945. eCollection 2021.
5
DeBERTa-BiLSTM: A multi-label classification model of Arabic medical questions using pre-trained models and deep learning.基于预训练模型和深度学习的阿拉伯文医学问题多标签分类模型:DeBERTa-BiLSTM
Comput Biol Med. 2024 Mar;170:107921. doi: 10.1016/j.compbiomed.2024.107921. Epub 2024 Jan 4.
6
BioInstruct: instruction tuning of large language models for biomedical natural language processing.BioInstruct:用于生物医学自然语言处理的大型语言模型的指令调整。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1821-1832. doi: 10.1093/jamia/ocae122.
7
Medical text classification based on the discriminative pre-training model and prompt-tuning.基于判别式预训练模型和提示调整的医学文本分类
Digit Health. 2023 Aug 6;9:20552076231193213. doi: 10.1177/20552076231193213. eCollection 2023 Jan-Dec.
8
A clinical text classification paradigm using weak supervision and deep representation.一种使用弱监督和深度表示的临床文本分类范式。
BMC Med Inform Decis Mak. 2019 Jan 7;19(1):1. doi: 10.1186/s12911-018-0723-6.
9
Leveraging pre-trained language models for mining microbiome-disease relationships.利用预训练语言模型挖掘微生物组-疾病关系。
BMC Bioinformatics. 2023 Jul 19;24(1):290. doi: 10.1186/s12859-023-05411-z.
10
Identification of Semantically Similar Sentences in Clinical Notes: Iterative Intermediate Training Using Multi-Task Learning.临床笔记中语义相似句子的识别:使用多任务学习的迭代中间训练
JMIR Med Inform. 2020 Nov 27;8(11):e22508. doi: 10.2196/22508.

引用本文的文献

1
Automated Shared Phenotype Discovery in Undiagnosed Cohorts for Rare Disease Research.罕见病研究中未确诊队列的自动化共享表型发现
Proc Int Conf Mach Learn Appl. 2024 Dec;2024:1025-1030. doi: 10.1109/icmla61862.2024.00154. Epub 2025 Mar 4.
2
Deep representation learning of patient data from Electronic Health Records (EHR): A systematic review.电子健康记录(EHR)中患者数据的深度表征学习:一项系统综述。
J Biomed Inform. 2021 Mar;115:103671. doi: 10.1016/j.jbi.2020.103671. Epub 2020 Dec 31.

本文引用的文献

1
Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.医学BERT:基于大规模结构化电子健康记录进行疾病预测的预训练上下文嵌入模型
NPJ Digit Med. 2021 May 20;4(1):86. doi: 10.1038/s41746-021-00455-y.
2
Deep representation learning of patient data from Electronic Health Records (EHR): A systematic review.电子健康记录(EHR)中患者数据的深度表征学习:一项系统综述。
J Biomed Inform. 2021 Mar;115:103671. doi: 10.1016/j.jbi.2020.103671. Epub 2020 Dec 31.
3
Language models are an effective representation learning technique for electronic health record data.
语言模型是一种用于电子健康记录数据的有效表示学习技术。
J Biomed Inform. 2021 Jan;113:103637. doi: 10.1016/j.jbi.2020.103637. Epub 2020 Dec 5.
4
PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records.PheMap:一个用于电子健康记录中高通量表型分析的多资源知识库。
J Am Med Inform Assoc. 2020 Nov 1;27(11):1675-1687. doi: 10.1093/jamia/ocaa104.
5
The use of machine learning in rare diseases: a scoping review.机器学习在罕见病中的应用:范围综述。
Orphanet J Rare Dis. 2020 Jun 9;15(1):145. doi: 10.1186/s13023-020-01424-6.
6
Patient Representation Transfer Learning from Clinical Notes based on Hierarchical Attention Network.基于分层注意力网络的临床笔记患者表示迁移学习
AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:597-606. eCollection 2020.
7
BEHRT: Transformer for Electronic Health Records.BEHRT:电子健康记录的转换器。
Sci Rep. 2020 Apr 28;10(1):7155. doi: 10.1038/s41598-020-62922-y.
8
Learning Hierarchical Representations of Electronic Health Records for Clinical Outcome Prediction.学习用于临床结果预测的电子健康记录的分层表示。
AMIA Annu Symp Proc. 2020 Mar 4;2019:597-606. eCollection 2019.
9
Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-Risk.基于深度学习架构预测 ICU 再入院率和描述高危患者的基准研究。
Sci Rep. 2020 Jan 24;10(1):1111. doi: 10.1038/s41598-020-58053-z.
10
Scalable and accurate deep learning with electronic health records.借助电子健康记录实现可扩展且准确的深度学习。
NPJ Digit Med. 2018 May 8;1:18. doi: 10.1038/s41746-018-0029-1. eCollection 2018.