Almeida-e-Silva Danillo C, Vêncio Ricardo Z N
Department of Computing and Mathematics FFCLRP-USP, University of Sao Paulo, Ribeirão Preto, Brazil.
Biotechniques. 2015 Mar 1;58(3):140-2. doi: 10.2144/000114266. eCollection 2015 Mar.
Statistical Inference of Function Through Evolutionary Relationships (SIFTER) is a powerful computational platform for probabilistic protein domain annotation. Nevertheless, SIFTER is not widely used, likely due to usability and scalability issues. Here we present SIFTER-T (SIFTER Throughput-optimized), a substantial improvement over SIFTER's original proof-of-principle implementation. SIFTER-T is optimized for better performance, allowing it to be used at the genome-wide scale. Compared to SIFTER 2.0, SIFTER-T achieved an 87-fold performance improvement using published test data sets for the known annotations recovering module and a 72.3% speed increase for the gene tree generation module in quad-core machines, as well as a major decrease in memory usage during the realignment phase. Memory optimization allowed an expanded set of proteins to be handled by SIFTER's probabilistic method. The improvement in performance and automation that we achieved allowed us to build a web server to bring the power of Bayesian phylogenomic inference to the genomics community. SIFTER-T and its online interface are freely available under GNU license at http://labpib.fmrp.usp.br/methods/SIFTER-t/ and https://github.com/dcasbioinfo/SIFTER-t.
通过进化关系进行功能的统计推断(SIFTER)是用于概率性蛋白质结构域注释的强大计算平台。然而,SIFTER并未得到广泛应用,可能是由于可用性和可扩展性问题。在此,我们展示了SIFTER-T(吞吐量优化的SIFTER),它是对SIFTER原始原理验证实现的重大改进。SIFTER-T针对更好的性能进行了优化,使其能够在全基因组范围内使用。与SIFTER 2.0相比,使用已发布的已知注释恢复模块测试数据集,SIFTER-T的性能提升了87倍,在四核机器中基因树生成模块的速度提高了72.3%,并且在重新比对阶段内存使用大幅减少。内存优化使得SIFTER的概率方法能够处理更多的蛋白质。我们在性能和自动化方面所取得的改进使我们能够构建一个网络服务器,将贝叶斯系统发育基因组学推断的能力带给基因组学界。SIFTER-T及其在线界面可根据GNU许可在http://labpib.fmrp.usp.br/methods/SIFTER-t/和https://github.com/dcasbioinfo/SIFTER-t免费获取。