• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于韩语医学自然语言处理的预训练 BERT。

A pre-trained BERT for Korean medical natural language processing.

机构信息

School of Computer Science and Information Engineering, The Catholic University of Korea, Bucheon, Republic of Korea.

Korea University Research Institute for Medical Bigdata Science, Korea University, Seoul, Republic of Korea.

出版信息

Sci Rep. 2022 Aug 16;12(1):13847. doi: 10.1038/s41598-022-17806-8.

DOI:10.1038/s41598-022-17806-8
PMID:35974113
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9381714/
Abstract

With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text is highly difficult to analyze because of the agglutinative characteristics of the language, as well as the complex terminologies in the medical domain. To solve this problem, we collected a Korean medical corpus and used it to train the language models. In this paper, we present a Korean medical language model based on deep learning NLP. The model was trained using the pre-training framework of BERT for the medical context based on a state-of-the-art Korean language model. The pre-trained model showed increased accuracies of 0.147 and 0.148 for the masked language model with next sentence prediction. In the intrinsic evaluation, the next sentence prediction accuracy improved by 0.258, which is a remarkable enhancement. In addition, the extrinsic evaluation of Korean medical semantic textual similarity data showed a 0.046 increase in the Pearson correlation, and the evaluation for the Korean medical named entity recognition showed a 0.053 increase in the F1-score.

摘要

随着深度学习和自然语言处理(NLP)的进步,医学文本的分析变得越来越重要。尽管处理医学文本非常重要,但目前还没有针对韩国医学专用语言模型的研究。由于语言的粘性特征以及医学领域复杂的术语,韩国医学文本的分析非常困难。为了解决这个问题,我们收集了一个韩语医学语料库,并使用它来训练语言模型。在本文中,我们提出了一种基于深度学习 NLP 的韩语医学语言模型。该模型是基于一种最先进的韩语语言模型,使用 BERT 的医学上下文预训练框架进行训练的。预训练模型在带下一句预测的屏蔽语言模型中的准确率提高了 0.147 和 0.148。在内在评估中,下一句预测的准确率提高了 0.258,这是一个显著的提高。此外,对韩语医学语义文本相似性数据的外部评估显示 Pearson 相关性增加了 0.046,对韩语医学命名实体识别的评估显示 F1 分数增加了 0.053。

相似文献

1
A pre-trained BERT for Korean medical natural language processing.用于韩语医学自然语言处理的预训练 BERT。
Sci Rep. 2022 Aug 16;12(1):13847. doi: 10.1038/s41598-022-17806-8.
2
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
3
Korean clinical entity recognition from diagnosis text using BERT.基于 BERT 的韩语文本临床实体识别。
BMC Med Inform Decis Mak. 2020 Sep 30;20(Suppl 7):242. doi: 10.1186/s12911-020-01241-8.
4
The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.2019年n2c2/OHNLP临床语义文本相似性赛道:概述
JMIR Med Inform. 2020 Nov 27;8(11):e23375. doi: 10.2196/23375.
5
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
6
Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis.使用Transformer模型预测临床句子对之间的语义相似性:评估与表征分析
JMIR Med Inform. 2021 May 26;9(5):e23099. doi: 10.2196/23099.
7
Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage.基于机器学习模型集成与 BERT 语言模型的脑 CT 报告文本描述分析用于判断颅内出血的比较研究
Sovrem Tekhnologii Med. 2024;16(1):27-34. doi: 10.17691/stm2024.16.1.03. Epub 2024 Feb 28.
8
MLM-based typographical error correction of unstructured medical texts for named entity recognition.基于 MLM 的非结构化医疗文本命名实体识别的排版错误校正。
BMC Bioinformatics. 2022 Nov 16;23(1):486. doi: 10.1186/s12859-022-05035-9.
9
A clinical specific BERT developed using a huge Japanese clinical text corpus.一个使用大型日本临床文本语料库开发的临床专用 BERT。
PLoS One. 2021 Nov 9;16(11):e0259763. doi: 10.1371/journal.pone.0259763. eCollection 2021.
10
GERNERMED++: Semantic annotation in German medical NLP through transfer-learning, translation and word alignment.GERNERMED++:通过迁移学习、翻译和词对齐实现德语医学自然语言处理中的语义标注。
J Biomed Inform. 2023 Nov;147:104513. doi: 10.1016/j.jbi.2023.104513. Epub 2023 Oct 13.

引用本文的文献

1
Machine learning for automated cause-of-death classification from 2021 to 2022 in Korea: development and validation of an ICD-10 prediction model.韩国2021年至2022年用于自动死因分类的机器学习:ICD - 10预测模型的开发与验证
Ewha Med J. 2025 Jul;48(3):e45. doi: 10.12771/emj.2025.00675. Epub 2025 Jul 28.
2
Beyond digital twins: the role of foundation models in enhancing the interpretability of multiomics modalities in precision medicine.超越数字孪生:基础模型在提高精准医学中多组学模式的可解释性方面的作用。
FEBS Open Bio. 2025 Aug;15(8):1192-1208. doi: 10.1002/2211-5463.70003. Epub 2025 Feb 24.
3
Natural language processing of electronic medical records identifies cardioprotective agents for anthracycline induced cardiotoxicity.

本文引用的文献

1
A Word Pair Dataset for Semantic Similarity and Relatedness in Korean Medical Vocabulary: Reference Development and Validation.一个用于韩语医学词汇语义相似性和相关性的词对数据集:参考开发与验证
JMIR Med Inform. 2021 Jun 24;9(6):e29667. doi: 10.2196/29667.
2
The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview.2019年n2c2/OHNLP临床语义文本相似性赛道:概述
JMIR Med Inform. 2020 Nov 27;8(11):e23375. doi: 10.2196/23375.
3
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.
电子病历的自然语言处理可识别用于蒽环类药物诱导心脏毒性的心脏保护剂。
Sci Rep. 2025 Feb 24;15(1):6678. doi: 10.1038/s41598-025-91187-6.
4
Entity-enhanced BERT for medical specialty prediction based on clinical questionnaire data.基于临床问卷数据的实体增强BERT用于医学专业预测
PLoS One. 2025 Jan 30;20(1):e0317795. doi: 10.1371/journal.pone.0317795. eCollection 2025.
5
A pediatric emergency prediction model using natural language process in the pediatric emergency department.一种在儿科急诊科使用自然语言处理的儿科急诊预测模型。
Sci Rep. 2025 Jan 28;15(1):3574. doi: 10.1038/s41598-025-87161-x.
6
Performance of GPT-3.5 and GPT-4 on the Korean Pharmacist Licensing Examination: Comparison Study.GPT-3.5和GPT-4在韩国药剂师执照考试中的表现:比较研究。
JMIR Med Educ. 2024 Dec 4;10:e57451. doi: 10.2196/57451.
7
Post-marketing surveillance of anticancer drugs using natural language processing of electronic medical records.利用电子病历的自然语言处理技术对抗癌药物进行上市后监测。
NPJ Digit Med. 2024 Nov 9;7(1):315. doi: 10.1038/s41746-024-01323-1.
8
Fine-Tuned Bidirectional Encoder Representations From Transformers Versus ChatGPT for Text-Based Outpatient Department Recommendation: Comparative Study.微调的基于转换器的双向编码器表示与 ChatGPT 用于基于文本的门诊推荐:比较研究。
JMIR Form Res. 2024 Oct 18;8:e47814. doi: 10.2196/47814.
9
Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage.基于机器学习模型集成与 BERT 语言模型的脑 CT 报告文本描述分析用于判断颅内出血的比较研究
Sovrem Tekhnologii Med. 2024;16(1):27-34. doi: 10.17691/stm2024.16.1.03. Epub 2024 Feb 28.
10
Transformer models in biomedicine.生物医学中的 Transformer 模型。
BMC Med Inform Decis Mak. 2024 Jul 29;24(1):214. doi: 10.1186/s12911-024-02600-5.
BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.