• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

重新思考针对临床语言的机器学习领域适应问题。

Rethinking domain adaptation for machine learning over clinical language.

作者信息

Laparra Egoitz, Bethard Steven, Miller Timothy A

机构信息

School of Information, University of Arizona, Tucson, Arizona, USA.

Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.

出版信息

JAMIA Open. 2020 Apr 13;3(2):146-150. doi: 10.1093/jamiaopen/ooaa010. eCollection 2020 Jul.

DOI:10.1093/jamiaopen/ooaa010
PMID:32734151
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7382626/
Abstract

Building clinical natural language processing (NLP) systems that work on widely varying data is an absolute necessity because of the expense of obtaining new training data. While domain adaptation research can have a positive impact on this problem, the most widely studied paradigms do not take into account the realities of clinical data sharing. To address this issue, we lay out a taxonomy of domain adaptation, parameterizing by what data is shareable. We show that the most realistic settings for clinical use cases are seriously under-studied. To support research in these important directions, we make a series of recommendations, not just for domain adaptation but for clinical NLP in general, that ensure that data, shared tasks, and released models are broadly useful, and that initiate research directions where the clinical NLP community can lead the broader NLP and machine learning fields.

摘要

由于获取新训练数据的成本高昂,构建适用于广泛不同数据的临床自然语言处理(NLP)系统是绝对必要的。虽然领域适应研究可能会对这个问题产生积极影响,但研究最广泛的范式并未考虑临床数据共享的实际情况。为了解决这个问题,我们提出了一种领域适应的分类法,根据可共享的数据进行参数化。我们表明,临床用例最现实的设置尚未得到充分研究。为了支持这些重要方向的研究,我们提出了一系列建议,不仅适用于领域适应,也适用于一般的临床NLP,以确保数据、共享任务和发布的模型具有广泛的实用性,并开启临床NLP社区能够引领更广泛的NLP和机器学习领域的研究方向。

相似文献

1
Rethinking domain adaptation for machine learning over clinical language.重新思考针对临床语言的机器学习领域适应问题。
JAMIA Open. 2020 Apr 13;3(2):146-150. doi: 10.1093/jamiaopen/ooaa010. eCollection 2020 Jul.
2
A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records.迁移学习和领域自适应在电子病历自然语言处理中的最新研究综述
Yearb Med Inform. 2021 Aug;30(1):239-244. doi: 10.1055/s-0041-1726522. Epub 2021 Sep 3.
3
A scoping review of publicly available language tasks in clinical natural language processing.临床自然语言处理中公开可用语言任务的范围综述
J Am Med Inform Assoc. 2022 Sep 12;29(10):1797-1806. doi: 10.1093/jamia/ocac127.
4
Clinical Text Data in Machine Learning: Systematic Review.机器学习中的临床文本数据:系统综述
JMIR Med Inform. 2020 Mar 31;8(3):e17984. doi: 10.2196/17984.
5
CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines.CLAMP - 一个用于高效构建定制化临床自然语言处理管道的工具包。
J Am Med Inform Assoc. 2018 Mar 1;25(3):331-336. doi: 10.1093/jamia/ocx132.
6
Improving the robustness and accuracy of biomedical language models through adversarial training.通过对抗训练提高生物医学语言模型的稳健性和准确性。
J Biomed Inform. 2022 Aug;132:104114. doi: 10.1016/j.jbi.2022.104114. Epub 2022 Jun 15.
7
Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records.结合无监督、监督和基于规则的学习:以电子健康记录中检测患者过敏为例。
BMC Med Inform Decis Mak. 2023 Sep 18;23(1):188. doi: 10.1186/s12911-023-02271-8.
8
Recurrent Deep Network Models for Clinical NLP Tasks: Use Case with Sentence Boundary Disambiguation.用于临床自然语言处理任务的循环深度网络模型:句子边界消歧用例
Stud Health Technol Inform. 2019 Aug 21;264:198-202. doi: 10.3233/SHTI190211.
9
Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis.支持语义分析的临床自然语言处理的最新进展。
Yearb Med Inform. 2015 Aug 13;10(1):183-93. doi: 10.15265/IY-2015-009.
10
Annotated dataset creation through large language models for non-english medical NLP.通过大型语言模型创建非英语医学自然语言处理的标注数据集。
J Biomed Inform. 2023 Sep;145:104478. doi: 10.1016/j.jbi.2023.104478. Epub 2023 Aug 23.

引用本文的文献

1
Supervised Learning and Large Language Model Benchmarks on Mental Health Datasets: Cognitive Distortions and Suicidal Risks in Chinese Social Media.心理健康数据集上的监督学习与大语言模型基准:中国社交媒体中的认知扭曲与自杀风险
Bioengineering (Basel). 2025 Aug 19;12(8):882. doi: 10.3390/bioengineering12080882.
2
Toward Cross-Hospital Deployment of Natural Language Processing Systems: Model Development and Validation of Fine-Tuned Large Language Models for Disease Name Recognition in Japanese.迈向自然语言处理系统的跨医院部署:用于日语疾病名称识别的微调大语言模型的模型开发与验证
JMIR Med Inform. 2025 Jul 8;13:e76773. doi: 10.2196/76773.
3
Tailoring task arithmetic to address bias in models trained on multi-institutional datasets.调整任务算法以解决在多机构数据集上训练的模型中的偏差问题。
J Biomed Inform. 2025 Aug;168:104858. doi: 10.1016/j.jbi.2025.104858. Epub 2025 Jun 8.
4
Assessment of a zero-shot large language model in measuring documented goals-of-care discussions.在衡量有记录的照护目标讨论方面对零样本大语言模型的评估
medRxiv. 2025 May 25:2025.05.23.25328115. doi: 10.1101/2025.05.23.25328115.
5
Development and prospective implementation of a large language model based system for early sepsis prediction.基于大语言模型的早期脓毒症预测系统的开发与前瞻性实施。
NPJ Digit Med. 2025 May 17;8(1):290. doi: 10.1038/s41746-025-01689-w.
6
Development and Prospective Implementation of a Large Language Model based System for Early Sepsis Prediction.基于大语言模型的早期脓毒症预测系统的开发与前瞻性实施
medRxiv. 2025 Mar 11:2025.03.07.25323589. doi: 10.1101/2025.03.07.25323589.
7
BAYESIAN NESTED LATENT CLASS MODELS FOR CAUSE-OF-DEATH ASSIGNMENT USING VERBAL AUTOPSIES ACROSS MULTIPLE DOMAINS.用于跨多个领域使用口头尸检进行死因分配的贝叶斯嵌套潜在类别模型
Ann Appl Stat. 2024 Jun;18(2):1137-1159. doi: 10.1214/23-aoas1826. Epub 2024 Apr 5.
8
Probing Patient Messages Enhanced by Natural Language Processing: A Top-Down Message Corpus Analysis.探索通过自然语言处理增强的患者信息:自上而下的信息语料库分析。
Health Data Sci. 2021 May 18;2021:1504854. doi: 10.34133/2021/1504854. eCollection 2021.
9
Generalization of finetuned transformer language models to new clinical contexts.微调后的变压器语言模型在新临床环境中的泛化。
JAMIA Open. 2023 Aug 16;6(3):ooad070. doi: 10.1093/jamiaopen/ooad070. eCollection 2023 Oct.
10
Deep representation learning of patient data from Electronic Health Records (EHR): A systematic review.电子健康记录(EHR)中患者数据的深度表征学习:一项系统综述。
J Biomed Inform. 2021 Mar;115:103671. doi: 10.1016/j.jbi.2020.103671. Epub 2020 Dec 31.

本文引用的文献

1
Transfer Adaptation Learning: A Decade Survey.迁移适应学习:十年综述
IEEE Trans Neural Netw Learn Syst. 2022 Jun 21;PP. doi: 10.1109/TNNLS.2022.3183326.
2
Simplified Neural Unsupervised Domain Adaptation.简化神经无监督域适应
Proc Conf. 2019 Jun;2019:414-419.
3
Adapting Word Embeddings from Multiple Domains to Symptom Recognition from Psychiatric Notes.将多领域词嵌入应用于精神科病历症状识别
AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:281-289. eCollection 2018.
4
Transfer learning for biomedical named entity recognition with neural networks.基于神经网络的生物医学命名实体识别的迁移学习。
Bioinformatics. 2018 Dec 1;34(23):4087-4094. doi: 10.1093/bioinformatics/bty449.
5
Towards generalizable entity-centric clinical coreference resolution.迈向可泛化的以实体为中心的临床共指消解
J Biomed Inform. 2017 May;69:251-258. doi: 10.1016/j.jbi.2017.04.015. Epub 2017 Apr 21.
6
Negation's not solved: generalizability versus optimizability in clinical natural language processing.否定问题尚未解决:临床自然语言处理中的可推广性与可优化性
PLoS One. 2014 Nov 13;9(11):e112774. doi: 10.1371/journal.pone.0112774. eCollection 2014.
7
Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.梅奥临床文本分析和知识提取系统(cTAKES):架构、组件评估和应用。
J Am Med Inform Assoc. 2010 Sep-Oct;17(5):507-13. doi: 10.1136/jamia.2009.001560.
8
MIMIC II: a massive temporal ICU patient database to support research in intelligent patient monitoring.MIMIC II:一个庞大的重症监护病房患者时间序列数据库,用于支持智能患者监测研究。
Comput Cardiol. 2002;29:641-4.