使用Hieranoid 2改进直系同源推断。

Improved orthology inference with Hieranoid 2.

作者信息

Kaduk Mateusz, Sonnhammer Erik

机构信息

Department of Biochemistry and Biophysics, Stockholm University.

Science for Life Laboratory (SciLifeLab), Tomtebodavagen 23, Solna, Sweden.

出版信息

Bioinformatics. 2017 Apr 15;33(8):1154-1159. doi: 10.1093/bioinformatics/btw774.

DOI:10.1093/bioinformatics/btw774

PMID:28096085

Abstract

MOTIVATION

The initial step in many orthology inference methods is the computationally demanding establishment of all pairwise protein similarities across all analysed proteomes. The quadratic scaling with proteomes has become a major bottleneck. A remedy is offered by the Hieranoid algorithm which reduces the complexity to linear by hierarchically aggregating ortholog groups from InParanoid along a species tree.

RESULTS

We have further developed the Hieranoid algorithm in many ways. Major improvements have been made to the construction of multiple sequence alignments and consensus sequences. Hieranoid version 2 was evaluated with standard benchmarks that reveal a dramatic increase in the coverage/accuracy tradeoff over version 1, such that it now compares favourably with the best methods. The new parallelized cluster mode allows Hieranoid to be run on large data sets in a much shorter timespan than InParanoid, yet at similar accuracy.

CONTACT

mateusz.kaduk@scilifelab.se.

AVAILABILITY AND IMPLEMENTATION

Perl code freely available at http://hieranoid.sbc.su.se/ .

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

许多直系同源推断方法的第一步是在所有分析的蛋白质组中建立所有成对蛋白质相似性，这在计算上要求很高。随着蛋白质组数量的增加，计算量呈二次方增长，这已成为一个主要瓶颈。Hieranoid算法提供了一种解决方案，它通过沿着物种树层次化地聚合来自InParanoid的直系同源组，将复杂度降低到线性。

结果

我们在许多方面进一步开发了Hieranoid算法。对多序列比对和共有序列的构建进行了重大改进。使用标准基准对Hieranoid版本2进行了评估，结果表明与版本1相比，覆盖范围/准确性权衡有了显著提高，现在与最佳方法相比也毫不逊色。新的并行化集群模式使Hieranoid能够在比InParanoid短得多的时间内处理大型数据集，且准确性相似。