Suppr超能文献

在基于同源性的框架内探索检测蛋白质功能相似性的方法。

Exploring Approaches for Detecting Protein Functional Similarity within an Orthology-based Framework.

机构信息

Center for Biomedicine, European Academy of Bozen/Bolzano (EURAC), (Affiliated to the University of Lübeck, Lübeck, Germany), Viale Druso 1, 39100, Bolzano, Italy.

出版信息

Sci Rep. 2017 Mar 23;7(1):381. doi: 10.1038/s41598-017-00465-5.

Abstract

Protein functional similarity based on gene ontology (GO) annotations serves as a powerful tool when comparing proteins on a functional level in applications such as protein-protein interaction prediction, gene prioritization, and disease gene discovery. Functional similarity (FS) is usually quantified by combining the GO hierarchy with an annotation corpus that links genes and gene products to GO terms. One large group of algorithms involves calculation of GO term semantic similarity (SS) between all the terms annotating the two proteins, followed by a second step, described as "mixing strategy", which involves combining the SS values to yield the final FS value. Due to the variability of protein annotation caused e.g. by annotation bias, this value cannot be reliably compared on an absolute scale. We therefore introduce a similarity z-score that takes into account the FS background distribution of each protein. For a selection of popular SS measures and mixing strategies we demonstrate moderate accuracy improvement when using z-scores in a benchmark that aims to separate orthologous cases from random gene pairs and discuss in this context the impact of annotation corpus choice. The approach has been implemented in Frela, a fast high-throughput public web server for protein FS calculation and interpretation.

摘要

基于基因本体 (GO) 注释的蛋白质功能相似性在蛋白质功能水平比较方面是一种强大的工具,可应用于蛋白质-蛋白质相互作用预测、基因优先级和疾病基因发现等领域。功能相似性 (FS) 通常通过将 GO 层次结构与注释语料库相结合来量化,该语料库将基因和基因产物与 GO 术语联系起来。一类大型算法涉及计算注释两个蛋白质的所有术语之间的 GO 术语语义相似性 (SS),然后是第二步,描述为“混合策略”,涉及组合 SS 值以得出最终的 FS 值。由于蛋白质注释的可变性,例如注释偏差,因此不能在绝对尺度上可靠地比较此值。因此,我们引入了相似性 z 分数,该分数考虑了每个蛋白质的 FS 背景分布。对于选择的流行 SS 度量和混合策略,我们在旨在将同源案例与随机基因对分开的基准测试中展示了适度的准确性提高,并在该上下文中讨论了注释语料库选择的影响。该方法已在 Frela 中实现,Frela 是一个快速的高通量公共网络服务器,用于计算和解释蛋白质 FS。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e270/5428484/54c45d5b4244/41598_2017_465_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验