反射随机索引和间接推断：一种用于发现隐式关联的可扩展方法。

Reflective Random Indexing and indirect inference: a scalable method for discovery of implicit connections.

机构信息

Center for Cognitive Informatics and Decision Making, School of Health Information Sciences, University of Texas, Houston, TX, USA.

出版信息

J Biomed Inform. 2010 Apr;43(2):240-56. doi: 10.1016/j.jbi.2009.09.003. Epub 2009 Sep 15.

DOI:10.1016/j.jbi.2009.09.003

PMID:19761870

Abstract

The discovery of implicit connections between terms that do not occur together in any scientific document underlies the model of literature-based knowledge discovery first proposed by Swanson. Corpus-derived statistical models of semantic distance such as Latent Semantic Analysis (LSA) have been evaluated previously as methods for the discovery of such implicit connections. However, LSA in particular is dependent on a computationally demanding method of dimension reduction as a means to obtain meaningful indirect inference, limiting its ability to scale to large text corpora. In this paper, we evaluate the ability of Random Indexing (RI), a scalable distributional model of word associations, to draw meaningful implicit relationships between terms in general and biomedical language. Proponents of this method have achieved comparable performance to LSA on several cognitive tasks while using a simpler and less computationally demanding method of dimension reduction than LSA employs. In this paper, we demonstrate that the original implementation of RI is ineffective at inferring meaningful indirect connections, and evaluate Reflective Random Indexing (RRI), an iterative variant of the method that is better able to perform indirect inference. RRI is shown to lead to more clearly related indirect connections and to outperform existing RI implementations in the prediction of future direct co-occurrence in the MEDLINE corpus.

摘要

该模型的基础是隐性关联的发现，这些关联存在于 Swanson 首次提出的基于文献的知识发现模型中，术语之间虽然没有在任何科学文献中同时出现，但存在隐性关联。基于语料库的语义距离统计模型，如潜在语义分析（LSA），之前已被评估为发现此类隐性关联的方法。然而，特别是 LSA 依赖于计算密集型的降维方法，作为获得有意义的间接推理的手段，这限制了它扩展到大型文本语料库的能力。在本文中，我们评估了随机索引（RI）的能力，这是一种可扩展的词项关联分布模型，用于在一般和生物医学语言中提取术语之间的有意义的隐性关系。该方法的支持者在多项认知任务中取得了与 LSA 相当的性能，同时使用了比 LSA 更简单、计算要求更低的降维方法。在本文中，我们证明了原始的 RI 实现无法有效地推断有意义的间接关联，并评估了反射随机索引（RRI），这是该方法的迭代变体，能够更好地进行间接推理。RRI 被证明可以产生更相关的间接关联，并在预测 MEDLINE 语料库中的未来直接共现方面优于现有的 RI 实现。

相似文献

Reflective Random Indexing and indirect inference: a scalable method for discovery of implicit connections.反射随机索引和间接推断：一种用于发现隐式关联的可扩展方法。

J Biomed Inform. 2010 Apr;43(2):240-56. doi: 10.1016/j.jbi.2009.09.003. Epub 2009 Sep 15.

The trajectory of scientific discovery: concept co-occurrence and converging semantic distance.科学发现的轨迹：概念共现与语义距离收敛

Stud Health Technol Inform. 2010;160(Pt 1):661-5.

Reflective random indexing for semi-automatic indexing of the biomedical literature.基于反射随机索引的生物医学文献半自动索引方法。

J Biomed Inform. 2010 Oct;43(5):694-700. doi: 10.1016/j.jbi.2010.04.001. Epub 2010 Apr 9.

Knowledge discovery by automated identification and ranking of implicit relationships.通过自动识别和对隐含关系进行排序来发现知识。

Bioinformatics. 2004 Feb 12;20(3):389-98. doi: 10.1093/bioinformatics/btg421. Epub 2004 Jan 22.

Using statistical and knowledge-based approaches for literature-based discovery.运用基于统计和知识的方法进行基于文献的发现。

J Biomed Inform. 2006 Dec;39(6):600-11. doi: 10.1016/j.jbi.2005.11.010. Epub 2006 Jan 4.

Recognizing names in biomedical texts: a machine learning approach.识别生物医学文本中的名称：一种机器学习方法。

Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10.

A new evaluation methodology for literature-based discovery systems.一种基于文献的发现系统的新评估方法。

J Biomed Inform. 2009 Aug;42(4):633-43. doi: 10.1016/j.jbi.2008.12.001. Epub 2008 Dec 16.

Medical knowledge evolution query constraining aspects.医学知识进化查询的约束方面。

Stud Health Technol Inform. 2011;169:549-53.

Term identification in the biomedical literature.生物医学文献中的术语识别。

J Biomed Inform. 2004 Dec;37(6):512-26. doi: 10.1016/j.jbi.2004.08.004.

Text Mining approaches for automated literature knowledge extraction and representation.用于自动文献知识提取与表示的文本挖掘方法。

Stud Health Technol Inform. 2010;160(Pt 2):954-8.

引用本文的文献

Drug repurposing for COVID-19 via knowledge graph completion.基于知识图谱补全的新冠病毒药物再利用

J Biomed Inform. 2021 Mar;115:103696. doi: 10.1016/j.jbi.2021.103696. Epub 2021 Feb 8.

Content-Sensitive Characterization of Peer Interactions of Highly Engaged Users in an Online Community for Smoking Cessation: Mixed-Methods Approach for Modeling User Engagement in Health Promotion Interventions.在线戒烟社区中高参与度用户同伴互动的内容敏感特征分析：健康促进干预中用户参与度建模的混合方法

J Particip Med. 2018 Jul 24;10(3):e9. doi: 10.2196/jopm.9745.

Neural networks for open and closed Literature-based Discovery.基于文献的开放式和封闭式发现的神经网络。

PLoS One. 2020 May 15;15(5):e0232891. doi: 10.1371/journal.pone.0232891. eCollection 2020.

Mining HPV Vaccine Knowledge Structures of Young Adults From Reddit Using Distributional Semantics and Pathfinder Networks.利用分布式语义和路径搜索网络挖掘 Reddit 中年轻成年人 HPV 疫苗知识结构

Cancer Control. 2020 Jan-Dec;27(1):1073274819891442. doi: 10.1177/1073274819891442.

Feature extraction for phenotyping from semantic and knowledge resources.从语义和知识资源中进行表型特征提取。

J Biomed Inform. 2019 Mar;91:103122. doi: 10.1016/j.jbi.2019.103122. Epub 2019 Feb 7.

Rapamycin - mTOR + BRAF = ? Using relational similarity to find therapeutically relevant drug-gene relationships in unstructured text.雷帕霉素 - mTOR + BRAF =？利用关系相似性在非结构化文本中寻找具有治疗相关性的药物 - 基因关系。

J Biomed Inform. 2019 Feb;90:103094. doi: 10.1016/j.jbi.2019.103094. Epub 2019 Jan 4.

Learning predictive models of drug side-effect relationships from distributed representations of literature-derived semantic predications.从文献来源的语义断言的分布式表示中学习药物副作用关系的预测模型。

J Am Med Inform Assoc. 2018 Oct 1;25(10):1339-1350. doi: 10.1093/jamia/ocy077.

Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations.利用生物医学知识图谱中的语义模式预测治疗和因果关系。

J Biomed Inform. 2018 Jun;82:189-199. doi: 10.1016/j.jbi.2018.05.003. Epub 2018 May 12.

DataMed - an open source discovery index for finding biomedical datasets.DataMed——一个用于查找生物医学数据集的开源发现索引。

J Am Med Inform Assoc. 2018 Mar 1;25(3):300-308. doi: 10.1093/jamia/ocx121.

A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge.生物医学数据集检索的公开基准：2016 年生物 CADDIE 数据集检索挑战赛的参考标准。

Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax061.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

反射随机索引和间接推断：一种用于发现隐式关联的可扩展方法。

Reflective Random Indexing and indirect inference: a scalable method for discovery of implicit connections.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献