通过考虑子序列同源性来提高所有对抗所有蛋白质比较的速度，同时保持敏感性。

Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology.

机构信息

University College London, London, United Kingdom.

Swiss Institute of Bioinformatics, Zurich, Switzerland.

出版信息

PeerJ. 2014 Oct 7;2:e607. doi: 10.7717/peerj.607. eCollection 2014.

DOI:10.7717/peerj.607

PMID:25320677

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4193403/

Abstract

Orthology inference and other sequence analyses across multiple genomes typically start by performing exhaustive pairwise sequence comparisons, a process referred to as "all-against-all". As this process scales quadratically in terms of the number of sequences analysed, this step can become a bottleneck, thus limiting the number of genomes that can be simultaneously analysed. Here, we explored ways of speeding-up the all-against-all step while maintaining its sensitivity. By exploiting the transitivity of homology and, crucially, ensuring that homology is defined in terms of consistent protein subsequences, our proof-of-concept resulted in a 4× speedup while recovering >99.6% of all homologs identified by the full all-against-all procedure on empirical sequences sets. In comparison, state-of-the-art k-mer approaches are orders of magnitude faster but only recover 3-14% of all homologous pairs. We also outline ideas to further improve the speed and recall of the new approach. An open source implementation is provided as part of the OMA standalone software at http://omabrowser.org/standalone.

摘要

在多个基因组中进行直系同源推断和其他序列分析通常首先执行详尽的两两序列比较，这一过程称为“全对全”。由于该过程在分析的序列数量方面呈二次方扩展，因此这一步骤可能成为瓶颈，从而限制了可以同时分析的基因组数量。在这里，我们探索了在保持其敏感性的同时加快全对全步骤的方法。通过利用同源性的传递性，并且至关重要的是，确保同源性是根据一致的蛋白质子序列定义的，我们的概念验证在经验序列集上以 4 倍的速度提高了速度，同时恢复了全对全过程识别的所有同源物的>99.6％。相比之下，最先进的 k-mer 方法快几个数量级，但仅恢复所有同源对的 3-14％。我们还概述了进一步提高新方法速度和召回率的想法。作为 OMA 独立软件的一部分，在 http://omabrowser.org/standalone 上提供了开源实现。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过考虑子序列同源性来提高所有对抗所有蛋白质比较的速度，同时保持敏感性。

Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

通过考虑子序列同源性来提高所有对抗所有蛋白质比较的速度，同时保持敏感性。

Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献