• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

临床文本的神经机器翻译:对多语言预训练语言模型和迁移学习的实证研究。

Neural machine translation of clinical text: an empirical investigation into multilingual pre-trained language models and transfer-learning.

作者信息

Han Lifeng, Gladkoff Serge, Erofeev Gleb, Sorokina Irina, Galiano Betty, Nenadic Goran

机构信息

Department of Computer Science, The University of Manchester, Manchester, United Kingom.

AI Lab, Logrus Global, Translation & Localization, Philadelphia, PA, United States.

出版信息

Front Digit Health. 2024 Feb 26;6:1211564. doi: 10.3389/fdgth.2024.1211564. eCollection 2024.

DOI:10.3389/fdgth.2024.1211564
PMID:38468693
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10926203/
Abstract

Clinical text and documents contain very rich information and knowledge in healthcare, and their processing using state-of-the-art language technology becomes very important for building intelligent systems for supporting healthcare and social good. This processing includes creating language understanding models and translating resources into other natural languages to share domain-specific cross-lingual knowledge. In this work, we conduct investigations on clinical text machine translation by examining multilingual neural network models using deep learning such as Transformer based structures. Furthermore, to address the language resource imbalance issue, we also carry out experiments using a transfer learning methodology based on massive multilingual pre-trained language models (MMPLMs). The experimental results on three sub-tasks including (1) clinical case (CC), (2) clinical terminology (CT), and (3) ontological concept (OC) show that our models achieved top-level performances in the ClinSpEn-2022 shared task on English-Spanish clinical domain data. Furthermore, our expert-based human evaluations demonstrate that the small-sized pre-trained language model (PLM) outperformed the other two extra-large language models by a large margin in the clinical domain fine-tuning, which finding was never reported in the field. Finally, the transfer learning method works well in our experimental setting using the WMT21fb model to accommodate a new language space Spanish that was not seen at the pre-training stage within WMT21fb itself, which deserves more exploitation for clinical knowledge transformation, e.g. to investigate into more languages. These research findings can shed some light on domain-specific machine translation development, especially in clinical and healthcare fields. Further research projects can be carried out based on our work to improve healthcare text analytics and knowledge transformation. Our data is openly available for research purposes at: https://github.com/HECTA-UoM/ClinicalNMT.

摘要

临床文本和文档包含医疗保健领域非常丰富的信息和知识,利用先进的语言技术对其进行处理对于构建支持医疗保健和社会公益的智能系统变得非常重要。这种处理包括创建语言理解模型以及将资源翻译成其他自然语言以共享特定领域的跨语言知识。在这项工作中,我们通过研究使用深度学习的多语言神经网络模型(如基于Transformer的结构)来进行临床文本机器翻译的调查。此外,为了解决语言资源不平衡问题,我们还基于大规模多语言预训练语言模型(MMPLMs)使用迁移学习方法进行实验。在包括(1)临床病例(CC)、(2)临床术语(CT)和(3)本体概念(OC)的三个子任务上的实验结果表明,我们的模型在ClinSpEn - 2022英语 - 西班牙语临床领域数据共享任务中取得了顶级性能。此外,我们基于专家的人工评估表明,在临床领域微调中,小型预训练语言模型(PLM)比其他两个超大型语言模型表现出色得多,这一发现从未在该领域报道过。最后,迁移学习方法在我们使用WMT21fb模型的实验设置中效果良好,以适应一个在WMT21fb本身预训练阶段未见过的新语言空间西班牙语,这值得在临床知识转化方面进行更多探索,例如研究更多语言。这些研究结果可以为特定领域的机器翻译发展提供一些启示,特别是在临床和医疗保健领域。可以基于我们的工作开展进一步的研究项目,以改善医疗文本分析和知识转化。我们的数据可公开用于研究目的,网址为:https://github.com/HECTA - UoM/ClinicalNMT。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/26505fea2852/fdgth-06-1211564-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/fa1daaa449e1/fdgth-06-1211564-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/e535656f9db3/fdgth-06-1211564-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/e6b7556ae6b3/fdgth-06-1211564-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/7d6681064452/fdgth-06-1211564-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/33fb2e196e8a/fdgth-06-1211564-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/a18da7e086d0/fdgth-06-1211564-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/1355a9f46954/fdgth-06-1211564-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/9455e75885c1/fdgth-06-1211564-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/f136c49ab9c9/fdgth-06-1211564-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/34df284e44fb/fdgth-06-1211564-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/e7850eabc8f7/fdgth-06-1211564-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/99851323a952/fdgth-06-1211564-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/26505fea2852/fdgth-06-1211564-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/fa1daaa449e1/fdgth-06-1211564-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/e535656f9db3/fdgth-06-1211564-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/e6b7556ae6b3/fdgth-06-1211564-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/7d6681064452/fdgth-06-1211564-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/33fb2e196e8a/fdgth-06-1211564-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/a18da7e086d0/fdgth-06-1211564-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/1355a9f46954/fdgth-06-1211564-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/9455e75885c1/fdgth-06-1211564-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/f136c49ab9c9/fdgth-06-1211564-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/34df284e44fb/fdgth-06-1211564-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/e7850eabc8f7/fdgth-06-1211564-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/99851323a952/fdgth-06-1211564-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d8c/10926203/26505fea2852/fdgth-06-1211564-g013.jpg

相似文献

1
Neural machine translation of clinical text: an empirical investigation into multilingual pre-trained language models and transfer-learning.临床文本的神经机器翻译:对多语言预训练语言模型和迁移学习的实证研究。
Front Digit Health. 2024 Feb 26;6:1211564. doi: 10.3389/fdgth.2024.1211564. eCollection 2024.
2
On cross-lingual retrieval with multilingual text encoders.关于使用多语言文本编码器进行跨语言检索。
Inf Retr Boston. 2022;25(2):149-183. doi: 10.1007/s10791-022-09406-x. Epub 2022 Mar 7.
3
Transfer Learning for Classifying Spanish and English Text by Clinical Specialties.基于临床专业对西班牙语和英语文本进行分类的迁移学习
Stud Health Technol Inform. 2021 May 27;281:377-381. doi: 10.3233/SHTI210184.
4
AmericasNLI: Machine translation and natural language inference systems for Indigenous languages of the Americas.美洲自然语言推理项目:用于美洲原住民语言的机器翻译和自然语言推理系统。
Front Artif Intell. 2022 Dec 2;5:995667. doi: 10.3389/frai.2022.995667. eCollection 2022.
5
Adaptation of machine translation for multilingual information retrieval in the medical domain.医学领域中用于多语言信息检索的机器翻译适配
Artif Intell Med. 2014 Jul;61(3):165-85. doi: 10.1016/j.artmed.2014.01.004. Epub 2014 Feb 5.
6
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
7
CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain.CLIN-X:用于临床领域概念提取的预训练语言模型和跨任务迁移研究。
Bioinformatics. 2022 Jun 13;38(12):3267-3274. doi: 10.1093/bioinformatics/btac297.
8
Transformer-based models for ICD-10 coding of death certificates with Portuguese text.基于Transformer 的模型在葡萄牙语死亡证明 ICD-10 编码中的应用。
J Biomed Inform. 2022 Dec;136:104232. doi: 10.1016/j.jbi.2022.104232. Epub 2022 Oct 25.
9
Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models.使用预训练语言模型从德语出院小结中自动提取12个心血管概念。
Digit Health. 2021 Nov 26;7:20552076211057662. doi: 10.1177/20552076211057662. eCollection 2021 Jan-Dec.
10
Scaling neural machine translation to 200 languages.将神经机器翻译扩展到 200 种语言。
Nature. 2024 Jun;630(8018):841-846. doi: 10.1038/s41586-024-07335-x. Epub 2024 Jun 5.

引用本文的文献

1
Artificial intelligence in clinical settings: a systematic review of its role in language translation and interpretation.临床环境中的人工智能:对其在语言翻译和口译中作用的系统评价
Ann Transl Med. 2024 Dec 24;12(6):117. doi: 10.21037/atm-24-162. Epub 2024 Dec 17.
2
A Real-Time Fault Diagnosis Method for Multi-Source Heterogeneous Information Fusion Based on Two-Level Transfer Learning.一种基于两级迁移学习的多源异构信息融合实时故障诊断方法
Entropy (Basel). 2024 Nov 22;26(12):1007. doi: 10.3390/e26121007.

本文引用的文献

1
Deployment of a Free-Text Analytics Platform at a UK National Health Service Research Hospital: CogStack at University College London Hospitals.英国国民健康服务研究医院的自由文本分析平台部署:伦敦大学学院医院的CogStack
JMIR Med Inform. 2022 Aug 24;10(8):e38122. doi: 10.2196/38122.
2
A Research Agenda for Using Machine Translation in Clinical Medicine.一份关于在临床医学中使用机器翻译的研究议程。
J Gen Intern Med. 2022 Apr;37(5):1275-1277. doi: 10.1007/s11606-021-07164-y. Epub 2022 Feb 7.
3
A Deep Language Model for Symptom Extraction From Clinical Text and its Application to Extract COVID-19 Symptoms From Social Media.
一种从临床文本中提取症状的深度语言模型及其在从社交媒体中提取 COVID-19 症状的应用。
IEEE J Biomed Health Inform. 2022 Apr;26(4):1737-1748. doi: 10.1109/JBHI.2021.3123192. Epub 2022 Apr 14.
4
Modern Clinical Text Mining: A Guide and Review.现代临床文本挖掘:指南与综述。
Annu Rev Biomed Data Sci. 2021 Jul 20;4:165-187. doi: 10.1146/annurev-biodatasci-030421-030931. Epub 2021 May 26.
5
Transfer Learning for Classifying Spanish and English Text by Clinical Specialties.基于临床专业对西班牙语和英语文本进行分类的迁移学习
Stud Health Technol Inform. 2021 May 27;281:377-381. doi: 10.3233/SHTI210184.
6
Health, Psychosocial, and Social Issues Emanating From the COVID-19 Pandemic Based on Social Media Comments: Text Mining and Thematic Analysis Approach.基于社交媒体评论的COVID-19大流行引发的健康、心理社会和社会问题:文本挖掘与主题分析方法
JMIR Med Inform. 2021 Apr 6;9(4):e22734. doi: 10.2196/22734.
7
Classification of COVID-19 by Compressed Chest CT Image through Deep Learning on a Large Patients Cohort.通过对大型患者队列的深度学习对压缩胸部 CT 图像进行 COVID-19 分类。
Interdiscip Sci. 2021 Mar;13(1):73-82. doi: 10.1007/s12539-020-00408-1. Epub 2021 Feb 9.
8
CPAS: the UK's national machine learning-based hospital capacity planning system for COVID-19.CPAS:英国基于机器学习的新冠疫情医院容量规划系统。
Mach Learn. 2021;110(1):15-35. doi: 10.1007/s10994-020-05921-4. Epub 2020 Nov 24.
9
Clinical Text Data in Machine Learning: Systematic Review.机器学习中的临床文本数据:系统综述
JMIR Med Inform. 2020 Mar 31;8(3):e17984. doi: 10.2196/17984.
10
2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records.2018n2c2 电子健康记录中药物不良反应和药物提取共享任务。
J Am Med Inform Assoc. 2020 Jan 1;27(1):3-12. doi: 10.1093/jamia/ocz166.