Bajaj Goonmeet, Nguyen Vinh, Wijesiriwardene Thilini, Yip Hong Yung, Javangula Vishesh, Parthasarathy Srinivasan, Sheth Amit, Bodenreider Olivier
The Ohio State University.
National Library of Medicine.
Proc Conf Assoc Comput Linguist Meet. 2022 May;2022:82-87. doi: 10.18653/v1/2022.insights-1.11.
Recent work uses a Siamese Network, initialized with BioWordVec embeddings (distributed word embeddings), for predicting synonymy among biomedical terms to automate a part of the UMLS (Unified Medical Language System) Metathesaurus construction process. We evaluate the use of contextualized word embeddings extracted from nine different biomedical BERT-based models for synonymy prediction in the UMLS by replacing BioWordVec embeddings with embeddings extracted from each biomedical BERT model using different feature extraction methods. Surprisingly, we find that Siamese Networks initialized with BioWordVec embeddings still outperform the Siamese Networks initialized with embedding extracted from biomedical BERT model.
最近的工作使用了一个以BioWordVec嵌入(分布式词嵌入)初始化的暹罗网络,用于预测生物医学术语之间的同义词,以自动化统一医学语言系统(UMLS)元词库构建过程的一部分。我们通过使用不同特征提取方法从每个生物医学BERT模型中提取的嵌入替换BioWordVec嵌入,评估了从九个不同的基于生物医学BERT的模型中提取的上下文词嵌入在UMLS中进行同义词预测的情况。令人惊讶的是,我们发现以BioWordVec嵌入初始化的暹罗网络仍然优于以从生物医学BERT模型中提取的嵌入初始化的暹罗网络。