从BERT进行迁移学习以支持将新概念插入SNOMED CT。

Transfer Learning from BERT to Support Insertion of New Concepts into SNOMED CT.

作者信息

Liu Hao, Perl Yehoshua, Geller James

机构信息

Dept of Computer Science, NJIT, Newark, NJ, USA.

出版信息

AMIA Annu Symp Proc. 2020 Mar 4;2019:1129-1138. eCollection 2019.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7153142/

Abstract

With advances in Machine Learning (ML), neural network-based methods, such as Convolutional/Recurrent Neural Networks, have been proposed to assist terminology curators in the development and maintenance of terminologies. Bidirectional Encoder Representations from Transformers (BERT), a new language representation model, obtains state-of-the-art results on a wide array of general English NLP tasks. We explore BERT's applicability to medical terminology-related tasks. Utilizing the "next sentence prediction" capability of BERT, we show that the Fine-tuning strategy of Transfer Learning (TL) from the BERT model can address a challenging problem in automatic terminology enrichment - insertion of new concepts. Adding a pre-training strategy enhances the results. We apply our strategies to the two largest hierarchies of SNOMED CT, with one release as training data and the following release as test data. The performance of the combined two proposed TL models achieves an average F1 score of 0.85 and 0.86 for the two hierarchies, respectively.

摘要

随着机器学习（ML）的发展，已经提出了基于神经网络的方法，如卷积/循环神经网络，以协助术语管理人员进行术语的开发和维护。来自变换器的双向编码器表示（BERT）是一种新的语言表示模型，在一系列通用英语自然语言处理任务中取得了领先成果。我们探索BERT在医学术语相关任务中的适用性。利用BERT的“下一句预测”能力，我们表明从BERT模型进行迁移学习（TL）的微调策略可以解决自动术语丰富中的一个具有挑战性的问题——插入新概念。添加预训练策略可提高结果。我们将我们的策略应用于SNOMED CT的两个最大层次结构，将一个版本作为训练数据，下一个版本作为测试数据。所提出的两个组合TL模型在两个层次结构上的性能分别达到了平均F1分数0.85和0.86。

相似文献

1

Transfer Learning from BERT to Support Insertion of New Concepts into SNOMED CT.从BERT进行迁移学习以支持将新概念插入SNOMED CT。

AMIA Annu Symp Proc. 2020 Mar 4;2019:1129-1138. eCollection 2019.

2

Concept placement using BERT trained by transforming and summarizing biomedical ontology structure.使用通过转换和总结生物医学本体结构训练的 BERT 进行概念放置。

J Biomed Inform. 2020 Dec;112:103607. doi: 10.1016/j.jbi.2020.103607. Epub 2020 Oct 22.

3

Comparing deep learning architectures for sentiment analysis on drug reviews.比较药物评论情感分析的深度学习架构。

J Biomed Inform. 2020 Oct;110:103539. doi: 10.1016/j.jbi.2020.103539. Epub 2020 Aug 17.

4

Using Convolutional Neural Networks to Support Insertion of New Concepts into SNOMED CT.使用卷积神经网络支持将新概念插入医学系统命名法临床术语（SNOMED CT）。

AMIA Annu Symp Proc. 2018 Dec 5;2018:750-759. eCollection 2018.

5

Training a Convolutional Neural Network with Terminology Summarization Data Improves SNOMED CT Enrichment.使用术语摘要数据训练卷积神经网络可改善SNOMED CT术语丰富度。

AMIA Annu Symp Proc. 2020 Mar 4;2019:972-981. eCollection 2019.

6

Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。

Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.

7

Assessing voids in SNOMED CT for pediatric concepts.评估儿科概念在SNOMED CT中的空白。

AMIA Annu Symp Proc. 2008 Nov 6:1164.

8

A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.基于 Transformer 的双向编码器表示模型的精细调整在食品命名实体识别中的应用：算法开发与验证。

J Med Internet Res. 2021 Aug 9;23(8):e28229. doi: 10.2196/28229.

9

Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT).使用基于转换器的双向编码器表示 (BERT) 和领域内预训练 (IDPT) 对耳鸣患者的可操作放射学报告进行自动文本分类。

BMC Med Inform Decis Mak. 2022 Jul 30;22(1):200. doi: 10.1186/s12911-022-01946-y.

10

Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives.开发和评估 RapTAT：一种用于从医学叙述中映射短语概念的机器学习系统。

J Biomed Inform. 2014 Apr;48:54-65. doi: 10.1016/j.jbi.2013.11.008. Epub 2013 Dec 4.

引用本文的文献

1

Quantitatively assessing the impact of the quality of SNOMED CT subtype hierarchy on cohort queries.定量评估SNOMED CT亚型层次结构质量对队列查询的影响。

J Am Med Inform Assoc. 2025 Jan 1;32(1):89-96. doi: 10.1093/jamia/ocae272.

2

Using language models and ontology topology to perform semantic mapping of traits between biomedical datasets.利用语言模型和本体拓扑结构对生物医学数据集之间的特征进行语义映射。

Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad169.

3

A deep learning approach to identify missing is-a relations in SNOMED CT.一种用于识别 SNOMED CT 中缺失的 is-a 关系的深度学习方法。

J Am Med Inform Assoc. 2023 Feb 16;30(3):475-484. doi: 10.1093/jamia/ocac248.

4

Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter.深度学习细菌和古菌的生命通用语言能够实现迁移学习并照亮微生物暗物质。

Nat Commun. 2022 May 11;13(1):2606. doi: 10.1038/s41467-022-30070-8.

5

Adapting Bidirectional Encoder Representations from Transformers (BERT) to Assess Clinical Semantic Textual Similarity: Algorithm Development and Validation Study.改编来自Transformer的双向编码器表征（BERT）以评估临床语义文本相似性：算法开发与验证研究。

JMIR Med Inform. 2021 Feb 3;9(2):e22795. doi: 10.2196/22795.

6

Automatic Structuring of Ontology Terms Based on Lexical Granularity and Machine Learning: Algorithm Development and Validation.基于词汇粒度和机器学习的本体术语自动构建：算法开发与验证

JMIR Med Inform. 2020 Nov 25;8(11):e22333. doi: 10.2196/22333.

7

A review of auditing techniques for the Unified Medical Language System.《统一医学语言系统的审计技术综述》

J Am Med Inform Assoc. 2020 Oct 1;27(10):1625-1638. doi: 10.1093/jamia/ocaa108.

本文引用的文献

1

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

2

Enhancing clinical concept extraction with contextual embeddings.利用上下文嵌入增强临床概念提取。

J Am Med Inform Assoc. 2019 Nov 1;26(11):1297-1304. doi: 10.1093/jamia/ocz096.

3

Using Convolutional Neural Networks to Support Insertion of New Concepts into SNOMED CT.使用卷积神经网络支持将新概念插入医学系统命名法临床术语（SNOMED CT）。

AMIA Annu Symp Proc. 2018 Dec 5;2018:750-759. eCollection 2018.

4

Corpus domain effects on distributional semantic modeling of medical terms.语料库领域对医学术语分布语义建模的影响。

Bioinformatics. 2016 Dec 1;32(23):3635-3644. doi: 10.1093/bioinformatics/btw529. Epub 2016 Aug 16.

5

Deep learning.深度学习。

Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.

6

Abstraction networks for terminologies: Supporting management of "big knowledge".术语的抽象网络：支持“大知识”的管理。

Artif Intell Med. 2015 May;64(1):1-16. doi: 10.1016/j.artmed.2015.03.005. Epub 2015 Apr 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。