Suppr超能文献

基于层次向量空间模型的改进蛋白质-蛋白质相互作用推断方法。

An improved approach to infer protein-protein interaction based on a hierarchical vector space model.

机构信息

Department of Computer Science & Technology, East China Normal University, North Zhongshan Road, Shanghai, 200062, China.

School of life science, East China Normal University, Dongchuan Road, Shanghai, 200241, China.

出版信息

BMC Bioinformatics. 2018 Apr 27;19(1):161. doi: 10.1186/s12859-018-2152-z.

Abstract

BACKGROUND

Comparing and classifying functions of gene products are important in today's biomedical research. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most widely used indicators for protein interaction. Among the various approaches proposed, those based on the vector space model are relatively simple, but their effectiveness is far from satisfying.

RESULTS

We propose a Hierarchical Vector Space Model (HVSM) for computing semantic similarity between different genes or their products, which enhances the basic vector space model by introducing the relation between GO terms. Besides the directly annotated terms, HVSM also takes their ancestors and descendants related by "is_a" and "part_of" relations into account. Moreover, HVSM introduces the concept of a Certainty Factor to calibrate the semantic similarity based on the number of terms annotated to genes. To assess the performance of our method, we applied HVSM to Homo sapiens and Saccharomyces cerevisiae protein-protein interaction datasets. Compared with TCSS, Resnik, and other classic similarity measures, HVSM achieved significant improvement for distinguishing positive from negative protein interactions. We also tested its correlation with sequence, EC, and Pfam similarity using online tool CESSM.

CONCLUSIONS

HVSM showed an improvement of up to 4% compared to TCSS, 8% compared to IntelliGO, 12% compared to basic VSM, 6% compared to Resnik, 8% compared to Lin, 11% compared to Jiang, 8% compared to Schlicker, and 11% compared to SimGIC using AUC scores. CESSM test showed HVSM was comparable to SimGIC, and superior to all other similarity measures in CESSM as well as TCSS. Supplementary information and the software are available at https://github.com/kejia1215/HVSM .

摘要

背景

在当今的生物医学研究中,比较和分类基因产物的功能非常重要。基于基因本体论(GO)注释的语义相似性已被视为蛋白质相互作用最广泛使用的指标之一。在提出的各种方法中,基于向量空间模型的方法相对简单,但效果远不能令人满意。

结果

我们提出了一种用于计算不同基因或其产物之间语义相似性的层次向量空间模型(HVSM),该模型通过引入 GO 术语之间的关系来增强基本向量空间模型。除了直接注释的术语外,HVSM 还考虑了通过“is_a”和“part_of”关系相关的其祖先和后代。此外,HVSM 引入了置信度因子的概念,根据注释到基因的术语数量来校准语义相似性。为了评估我们方法的性能,我们将 HVSM 应用于 Homo sapiens 和 Saccharomyces cerevisiae 蛋白质-蛋白质相互作用数据集。与 TCSS、Resnik 和其他经典相似性度量相比,HVSM 在区分阳性和阴性蛋白质相互作用方面取得了显著的改进。我们还使用在线工具 CESSM 测试了其与序列、EC 和 Pfam 相似性的相关性。

结论

HVSM 在 AUC 评分方面与 TCSS 相比提高了 4%,与 IntelliGO 相比提高了 8%,与基本 VSM 相比提高了 12%,与 Resnik 相比提高了 6%,与 Lin 相比提高了 8%,与 Jiang 相比提高了 11%,与 Schlicker 相比提高了 8%,与 SimGIC 相比提高了 11%。CESSM 测试表明,HVSM 与 SimGIC 相当,在 CESSM 和 TCSS 中均优于所有其他相似性度量。补充信息和软件可在 https://github.com/kejia1215/HVSM 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c22b/5921294/6577f2bf8519/12859_2018_2152_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验