McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX.
Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX.
AMIA Annu Symp Proc. 2024 Jan 11;2023:977-986. eCollection 2023.
The Unified Medical Language System (UMLS), a large repository of biomedical vocabularies, has been used for supporting various biomedical applications. Ensuring the quality of the UMLS is critical to maintain both the accuracy of its content and the reliability of downstream applications. In this work, we present a Graph Convolutional Network (GCN)-based approach to identify misaligned synonymous terms organized under different UMLS concepts. We used synonymous terms grouped under the same concept as positive samples and top lexically similar terms as negative samples to train the GCN model. We applied the model to a test set and suggested those negative samples predicted to be synonymous as potentially misaligned synonymous terms. A total of 147,625 suggestions were made. A human expert evaluated 100 randomly selected suggestions and agreed with 60 of them. The results indicate that our GCN-based approach shows promise to help improve the synonymy grouping in the UMLS.
统一医学语言系统(UMLS)是一个大型生物医学词汇库,用于支持各种生物医学应用。确保 UMLS 的质量对于保持其内容的准确性和下游应用的可靠性至关重要。在这项工作中,我们提出了一种基于图卷积网络(GCN)的方法来识别组织在不同 UMLS 概念下的对齐同义词。我们使用同一概念下的同义词作为正样本,以及词汇上最相似的术语作为负样本来训练 GCN 模型。我们将模型应用于测试集,并建议将预测为同义词的那些负样本作为潜在的对齐同义词。共提出了 147625 条建议。一位人类专家评估了 100 条随机选择的建议,并对其中 60 条表示认可。结果表明,我们的基于 GCN 的方法有希望帮助改善 UMLS 中的同义词分组。