Lavezzo Enrico, Falda Marco, Fontana Paolo, Bianco Luca, Toppo Stefano
Department of Molecular Medicine, University of Padova, Padova, Italy.
Istituto Agrario San Michele all'Adige Research and Innovation Centre, Foundation Edmund Mach, Trento, Italy.
Methods. 2016 Jan 15;93:15-23. doi: 10.1016/j.ymeth.2015.08.021. Epub 2015 Aug 28.
Argot2.5 (Annotation Retrieval of Gene Ontology Terms) is a web server designed to predict protein function. It is an updated version of the previous Argot2 enriched with new features in order to enhance its usability and its overall performance. The algorithmic strategy exploits the grouping of Gene Ontology terms by means of semantic similarity to infer protein function. The tool has been challenged over two independent benchmarks and compared to Argot2, PANNZER, and a baseline method relying on BLAST, proving to obtain a better performance thanks to the contribution of some key interventions in critical steps of the working pipeline. The most effective changes regard: (a) the selection of the input data from sequence similarity searches performed against a clustered version of UniProt databank and a remodeling of the weights given to Pfam hits, (b) the application of taxonomic constraints to filter out annotations that cannot be applied to proteins belonging to the species under investigation. The taxonomic rules are derived from our in-house developed tool, FunTaxIS, that extends those provided by the Gene Ontology consortium. The web server is free for academic users and is available online at http://www.medcomp.medicina.unipd.it/Argot2-5/.
Argot2.5(基因本体术语注释检索)是一个旨在预测蛋白质功能的网络服务器。它是之前Argot2的更新版本,增添了新功能,以提高其可用性和整体性能。该算法策略利用基因本体术语通过语义相似性进行分组来推断蛋白质功能。该工具在两个独立的基准测试中接受了检验,并与Argot2、PANNZER以及一种依赖BLAST的基线方法进行了比较,结果表明,由于在工作流程的关键步骤中采取了一些关键干预措施,它取得了更好的性能。最有效的改进包括:(a)从针对UniProt数据库聚类版本进行的序列相似性搜索中选择输入数据,并对给予Pfam匹配项的权重进行重新调整,(b)应用分类学约束来过滤掉不适用于所研究物种蛋白质的注释。分类学规则源自我们内部开发的工具FunTaxIS,它扩展了基因本体联盟提供的规则。该网络服务器对学术用户免费,可在http://www.medcomp.medicina.unipd.it/Argot2-5/在线使用。