Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.
Swiss Institute of Bioinformatics, Lausanne, Switzerland.
PeerJ. 2022 Feb 15;10:e12931. doi: 10.7717/peerj.12931. eCollection 2022.
Protein function prediction is an important part of bioinformatics and genomics studies. There are many different predictors available, however most of these are in the form of web-servers instead of open-source locally installable versions. Such local versions are necessary to perform large scale genomics studies due to the presence of limitations imposed by web servers such as queues, prediction speed, and updatability of databases.
This paper describes Wei2GO: a weighted sequence similarity and python-based open-source protein function prediction software. It uses DIAMOND and HMMScan sequence alignment searches against the UniProtKB and Pfam databases respectively, transfers Gene Ontology terms from the reference protein to the query protein, and uses a weighing algorithm to calculate a score for the Gene Ontology annotations.
Wei2GO is compared against the Argot2 and Argot2.5 web servers, which use a similar concept, and DeepGOPlus which acts as a reference. Wei2GO shows an increase in performance according to precision and recall curves, F scores, and S scores for biological process and molecular function ontologies. Computational time compared to Argot2 and Argot2.5 is decreased from several hours to several minutes.
Wei2GO is written in Python 3, and can be found at https://gitlab.com/mreijnders/Wei2GO.
蛋白质功能预测是生物信息学和基因组学研究的重要组成部分。有许多不同的预测器可供选择,但其中大多数都是网络服务器的形式,而不是可本地安装的开源版本。由于网络服务器存在队列、预测速度和数据库更新性等限制,因此需要本地版本才能进行大规模的基因组学研究。
本文介绍了 Wei2GO:一种基于加权序列相似性和 Python 的开源蛋白质功能预测软件。它使用 DIAMOND 和 HMMScan 序列比对搜索分别针对 UniProtKB 和 Pfam 数据库,将基因本体论术语从参考蛋白转移到查询蛋白,并使用加权算法计算基因本体论注释的分数。
Wei2GO 与使用类似概念的 Argot2 和 Argot2.5 网络服务器以及作为参考的 DeepGOPlus 进行了比较。根据精度和召回率曲线、F 分数和生物学过程和分子功能本体论的 S 分数,Wei2GO 的性能有所提高。与 Argot2 和 Argot2.5 相比,计算时间从数小时减少到数分钟。
Wei2GO 是用 Python 3 编写的,可以在 https://gitlab.com/mreijnders/Wei2GO 找到。