Cohen Trevor, Widdows Dominic
School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, United States.
Grab, Inc., Seattle, WA, United States.
J Biomed Inform. 2017 Apr;68:150-166. doi: 10.1016/j.jbi.2017.03.003. Epub 2017 Mar 8.
This paper concerns the generation of distributed vector representations of biomedical concepts from structured knowledge, in the form of subject-relation-object triplets known as semantic predications. Specifically, we evaluate the extent to which a representational approach we have developed for this purpose previously, known as Predication-based Semantic Indexing (PSI), might benefit from insights gleaned from neural-probabilistic language models, which have enjoyed a surge in popularity in recent years as a means to generate distributed vector representations of terms from free text. To do so, we develop a novel neural-probabilistic approach to encoding predications, called Embedding of Semantic Predications (ESP), by adapting aspects of the Skipgram with Negative Sampling (SGNS) algorithm to this purpose. We compare ESP and PSI across a number of tasks including recovery of encoded information, estimation of semantic similarity and relatedness, and identification of potentially therapeutic and harmful relationships using both analogical retrieval and supervised learning. We find advantages for ESP in some, but not all of these tasks, revealing the contexts in which the additional computational work of neural-probabilistic modeling is justified.
本文关注从结构化知识中生成生物医学概念的分布式向量表示,这些结构化知识以称为语义谓词的主语 - 关系 - 宾语三元组的形式存在。具体而言,我们评估了一种我们之前为此目的开发的表示方法,即基于谓词的语义索引(PSI),在多大程度上可能受益于从神经概率语言模型中获得的见解,近年来,神经概率语言模型作为一种从自由文本中生成术语分布式向量表示的手段而大受欢迎。为此,我们通过将负采样Skipgram(SGNS)算法的各个方面应用于此目的,开发了一种新颖的神经概率方法来编码谓词,称为语义谓词嵌入(ESP)。我们在多个任务中比较了ESP和PSI,包括编码信息的恢复、语义相似性和相关性的估计,以及使用类比检索和监督学习来识别潜在的治疗和有害关系。我们发现在其中一些但并非所有这些任务中ESP具有优势,揭示了神经概率建模的额外计算工作合理的上下文。