Suppr超能文献

利用潜在主题从文献中预测蛋白质-蛋白质关系。

Predicting protein-protein relationships from literature using latent topics.

作者信息

Aso Tatsuya, Eguchi Koji

机构信息

Department of Computer Science and Systems Engineering, Kobe University, 1-1 Rokkoudai, Nada-ku, Kobe 657-8501, Japan.

出版信息

Genome Inform. 2009 Oct;23(1):3-12.

Abstract

This paper investigates applying statistical topic models to extract and predict relationships between biological entities, especially protein mentions. A statistical topic model, Latent Dirichlet Allocation (LDA) is promising; however, it has not been investigated for such a task. In this paper, we apply the state-of-the-art Collapsed Variational Bayesian Inference and Gibbs Sampling inference to estimating the LDA model. We also apply probabilistic Latent Semantic Analysis (pLSA) as a baseline for comparison, and compare them from the viewpoints of log-likelihood, classification accuracy and retrieval effectiveness. We demonstrate through experiments that the Collapsed Variational LDA gives better results than the others, especially in terms of classification accuracy and retrieval effectiveness in the task of the protein-protein relationship prediction.

摘要

本文研究了应用统计主题模型来提取和预测生物实体之间的关系,特别是蛋白质提及之间的关系。统计主题模型——潜在狄利克雷分配(LDA)很有前景;然而,尚未针对此类任务对其进行研究。在本文中,我们应用最先进的塌缩变分贝叶斯推理和吉布斯采样推理来估计LDA模型。我们还应用概率潜在语义分析(pLSA)作为比较的基线,并从对数似然、分类准确率和检索效率的角度对它们进行比较。我们通过实验证明,塌缩变分LDA比其他方法能给出更好的结果,特别是在蛋白质-蛋白质关系预测任务的分类准确率和检索效率方面。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验