IdiSNA, Navarra Institute for Health Research, Pamplona E-31008, Spain.
Bioinformatics Platform, CIMA University of Navarra, Pamplona E-31008, Spain.
J Proteome Res. 2020 Dec 4;19(12):4795-4807. doi: 10.1021/acs.jproteome.0c00364. Epub 2020 Nov 6.
The Human Proteome Project (HPP) is leading the international effort to characterize the human proteome. Although the main goal of this project was first focused on the detection of missing proteins, a new challenge arose from the need to assign biological functions to the uncharacterized human proteins and describe their implications in human diseases. Not only the proteins with experimental evidence (uPE1 proteins) but also the uncharacterized missing proteins (uMPs) were the objects of study in this challenge, neXt-CP50. In this work, we developed a new bioinformatic approach to infer biological annotations for the uPE1 proteins and uMPs based on a "guilt-by-association" analysis using public RNA-Seq data sets. We used the correlation of these proteins with the well-characterized PE1 proteins to construct a network. In this way, we applied the PageRank algorithm to this network to identify the most relevant nodes, which were the biological annotations of the uncharacterized proteins. All of the generated information was stored in a database. In addition, we implemented the web application UPEFinder (https://upefinder.proteored.org) to facilitate the access to this new resource. This information is especially relevant for the researchers of the HPP who are interested in the generation and validation of new hypotheses about the functions of these proteins. Both the database and the web application are publicly available (https://github.com/ubioinformat/UPEfinder).
人类蛋白质组计划(HPP)正在领导国际努力来描绘人类蛋白质组。尽管该项目的主要目标最初集中在检测缺失的蛋白质上,但新的挑战来自于需要为未鉴定的人类蛋白质赋予生物学功能,并描述它们在人类疾病中的影响。不仅具有实验证据的蛋白质(uPE1 蛋白质),而且未鉴定的缺失蛋白质(uMPs)都是这个挑战(neXt-CP50)的研究对象。在这项工作中,我们开发了一种新的生物信息学方法,根据使用公共 RNA-Seq 数据集的“关联有罪”分析,为 uPE1 蛋白质和 uMPs 推断生物学注释。我们使用这些蛋白质与 well-characterized PE1 蛋白质的相关性来构建网络。通过这种方式,我们将 PageRank 算法应用于该网络,以识别最相关的节点,这些节点是未鉴定蛋白质的生物学注释。所有生成的信息都存储在数据库中。此外,我们还实现了 web 应用程序 UPEFinder(https://upefinder.proteored.org),以方便访问这个新资源。对于 HPP 的研究人员来说,这些信息尤其相关,他们对这些蛋白质的功能生成和验证新假设很感兴趣。数据库和 web 应用程序都是公开可用的(https://github.com/ubioinformat/UPEfinder)。