Suppr超能文献

替换非生物医学概念可改善生物医学概念的嵌入。

Replacing non-biomedical concepts improves embedding of biomedical concepts.

作者信息

Niyonkuru Enock, Gomez Mauricio Soto, Casiraghi Elena, Antogiovanni Stephan, Blau Hannah, Reese Justin T, Valentini Giorgio, Robinson Peter N

机构信息

The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA.

Trinity College, Hartford, CT, USA.

出版信息

bioRxiv. 2024 Jul 4:2024.07.01.601556. doi: 10.1101/2024.07.01.601556.

Abstract

OBJECTIVES

Concept embeddings are low-dimensional vector representations of concepts such as MeSH:D009203 (Myocardial Infarction), whose similarity in the embedded vector space reflects their semantic similarity. Here, we test the hypothesis that non-biomedical concept synonym replacement can improve the quality of biomedical concepts embeddings.

MATERIALS AND METHODS

We developed an approach that leverages WordNet to replace sets of synonyms with the most common representative of the synonym set.

RESULTS

We tested our approach on 1055 concept sets and found that, on average, the mean intra-cluster distance was reduced by 8% in the vector-space. Assuming that homophily of related concepts in the vector space is desirable, our approach tends to improve the quality of embeddings.

DISCUSSION AND CONCLUSION

This pilot study shows that non-biomedical synonym replacement tends to improve the quality of embeddings of biomedical concepts using the Word2Vec algorithm. We have implemented our approach in a freely available Python package available at https://github.com/TheJacksonLaboratory/wn2vec.

摘要

目标

概念嵌入是诸如医学主题词表:D009203(心肌梗死)等概念的低维向量表示,其在嵌入向量空间中的相似性反映了它们的语义相似性。在此,我们检验非生物医学概念同义词替换可提高生物医学概念嵌入质量这一假设。

材料与方法

我们开发了一种利用WordNet用同义词集最常见的代表来替换同义词集的方法。

结果

我们在1055个概念集上测试了我们的方法,发现在向量空间中,平均而言,簇内平均距离降低了8%。假设向量空间中相关概念的同质性是可取的,我们的方法倾向于提高嵌入质量。

讨论与结论

这项初步研究表明,使用Word2Vec算法,非生物医学同义词替换倾向于提高生物医学概念嵌入的质量。我们已将我们的方法实现为一个可在https://github.com/TheJacksonLaboratory/wn2vec获取的免费Python包。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d30e/11244985/e8395f53051e/nihpp-2024.07.01.601556v1-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验