Suppr超能文献

NetQuilt:基于深度多物种网络的蛋白质功能预测,利用同源性信息网络相似性

NetQuilt: deep multispecies network-based protein function prediction using homology-informed network similarity.

作者信息

Barot Meet, Gligorijević Vladimir, Cho Kyunghyun, Bonneau Richard

机构信息

Center for Data Science, New York University, New York, NY 10011, USA.

Center for Computational Biology, Flatiron Institute, New York, NY 10010, USA.

出版信息

Bioinformatics. 2021 Aug 25;37(16):2414-2422. doi: 10.1093/bioinformatics/btab098.

Abstract

MOTIVATION

Transferring knowledge between species is challenging: different species contain distinct proteomes and cellular architectures, which cause their proteins to carry out different functions via different interaction networks. Many approaches to protein functional annotation use sequence similarity to transfer knowledge between species. These approaches cannot produce accurate predictions for proteins without homologues of known function, as many functions require cellular context for meaningful prediction. To supply this context, network-based methods use protein-protein interaction (PPI) networks as a source of information for inferring protein function and have demonstrated promising results in function prediction. However, most of these methods are tied to a network for a single species, and many species lack biological networks.

RESULTS

In this work, we integrate sequence and network information across multiple species by computing IsoRank similarity scores to create a meta-network profile of the proteins of multiple species. We use this integrated multispecies meta-network as input to train a maxout neural network with Gene Ontology terms as target labels. Our multispecies approach takes advantage of more training examples, and consequently leads to significant improvements in function prediction performance compared to two network-based methods, a deep learning sequence-based method and the BLAST annotation method used in the Critial Assessment of Functional Annotation. We are able to demonstrate that our approach performs well even in cases where a species has no network information available: when an organism's PPI network is left out we can use our multi-species method to make predictions for the left-out organism with good performance.

AVAILABILITY AND IMPLEMENTATION

The code is freely available at https://github.com/nowittynamesleft/NetQuilt. The data, including sequences, PPI networks and GO annotations are available at https://string-db.org/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

跨物种转移知识具有挑战性:不同物种包含不同的蛋白质组和细胞结构,这使得它们的蛋白质通过不同的相互作用网络执行不同的功能。许多蛋白质功能注释方法利用序列相似性在物种间转移知识。对于没有已知功能同源物的蛋白质,这些方法无法做出准确预测,因为许多功能需要细胞背景才能进行有意义的预测。为了提供这种背景信息,基于网络的方法将蛋白质 - 蛋白质相互作用(PPI)网络用作推断蛋白质功能的信息来源,并在功能预测中取得了有前景的结果。然而,这些方法大多与单个物种的网络相关联,而且许多物种缺乏生物网络。

结果

在这项工作中,我们通过计算IsoRank相似性得分来整合多个物种的序列和网络信息,以创建多个物种蛋白质的元网络概况。我们使用这个整合的多物种元网络作为输入,训练一个以基因本体术语作为目标标签的maxout神经网络。我们的多物种方法利用了更多的训练示例,因此与两种基于网络的方法、一种基于深度学习序列的方法以及功能注释关键评估中使用的BLAST注释方法相比,在功能预测性能上有显著提升。我们能够证明,即使在一个物种没有可用网络信息的情况下,我们的方法也能表现良好:当一个生物体的PPI网络被忽略时,我们可以使用我们的多物种方法对被忽略的生物体进行预测,且性能良好。

可用性与实现

代码可在https://github.com/nowittynamesleft/NetQuilt上免费获取。数据,包括序列、PPI网络和GO注释,可在https://string-db.org/上获取。

补充信息

补充数据可在《生物信息学》在线版获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/479f/8388039/ceca6df09d1a/btab098f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验