Suppr超能文献

反射随机索引和间接推断:一种用于发现隐式关联的可扩展方法。

Reflective Random Indexing and indirect inference: a scalable method for discovery of implicit connections.

机构信息

Center for Cognitive Informatics and Decision Making, School of Health Information Sciences, University of Texas, Houston, TX, USA.

出版信息

J Biomed Inform. 2010 Apr;43(2):240-56. doi: 10.1016/j.jbi.2009.09.003. Epub 2009 Sep 15.

Abstract

The discovery of implicit connections between terms that do not occur together in any scientific document underlies the model of literature-based knowledge discovery first proposed by Swanson. Corpus-derived statistical models of semantic distance such as Latent Semantic Analysis (LSA) have been evaluated previously as methods for the discovery of such implicit connections. However, LSA in particular is dependent on a computationally demanding method of dimension reduction as a means to obtain meaningful indirect inference, limiting its ability to scale to large text corpora. In this paper, we evaluate the ability of Random Indexing (RI), a scalable distributional model of word associations, to draw meaningful implicit relationships between terms in general and biomedical language. Proponents of this method have achieved comparable performance to LSA on several cognitive tasks while using a simpler and less computationally demanding method of dimension reduction than LSA employs. In this paper, we demonstrate that the original implementation of RI is ineffective at inferring meaningful indirect connections, and evaluate Reflective Random Indexing (RRI), an iterative variant of the method that is better able to perform indirect inference. RRI is shown to lead to more clearly related indirect connections and to outperform existing RI implementations in the prediction of future direct co-occurrence in the MEDLINE corpus.

摘要

该模型的基础是隐性关联的发现,这些关联存在于 Swanson 首次提出的基于文献的知识发现模型中,术语之间虽然没有在任何科学文献中同时出现,但存在隐性关联。基于语料库的语义距离统计模型,如潜在语义分析(LSA),之前已被评估为发现此类隐性关联的方法。然而,特别是 LSA 依赖于计算密集型的降维方法,作为获得有意义的间接推理的手段,这限制了它扩展到大型文本语料库的能力。在本文中,我们评估了随机索引(RI)的能力,这是一种可扩展的词项关联分布模型,用于在一般和生物医学语言中提取术语之间的有意义的隐性关系。该方法的支持者在多项认知任务中取得了与 LSA 相当的性能,同时使用了比 LSA 更简单、计算要求更低的降维方法。在本文中,我们证明了原始的 RI 实现无法有效地推断有意义的间接关联,并评估了反射随机索引(RRI),这是该方法的迭代变体,能够更好地进行间接推理。RRI 被证明可以产生更相关的间接关联,并在预测 MEDLINE 语料库中的未来直接共现方面优于现有的 RI 实现。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验