Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada.
Department of Computer Science, Université de Sherbrooke, Sherbrooke, Canada.
Bioinformatics. 2018 Jul 1;34(13):i366-i375. doi: 10.1093/bioinformatics/bty242.
When gene duplication occurs, one of the copies may become free of selective pressure and evolve at an accelerated pace. This has important consequences on the prediction of orthology relationships, since two orthologous genes separated by divergence after duplication may differ in both sequence and function. In this work, we make the distinction between the primary orthologs, which have not been affected by accelerated mutation rates on their evolutionary path, and the secondary orthologs, which have. Similarity-based prediction methods will tend to miss secondary orthologs, whereas phylogeny-based methods cannot separate primary and secondary orthologs. However, both types of orthology have applications in important areas such as gene function prediction and phylogenetic reconstruction, motivating the need for methods that can distinguish the two types.
We formalize the notion of divergence after duplication and provide a theoretical basis for the inference of primary and secondary orthologs. We then put these ideas to practice with the Hybrid Prediction of Paralogs and Orthologs (HyPPO) framework, which combines ideas from both similarity and phylogeny approaches. We apply our method to simulated and empirical datasets and show that we achieve superior accuracy in predicting primary orthologs, secondary orthologs and paralogs.
HyPPO is a modular framework with a core developed in Python and is provided with a variety of C++ modules. The source code is available at https://github.com/manuellafond/HyPPO.
Supplementary data are available at Bioinformatics online.
当基因发生复制时,其中一个副本可能会摆脱选择压力,以更快的速度进化。这对同源关系预测有重要影响,因为在复制后发生分歧而分离的两个直系同源基因在序列和功能上可能会有所不同。在这项工作中,我们区分了未受到进化过程中加速突变率影响的原始直系同源物和受到影响的次级直系同源物。基于相似性的预测方法往往会错过次级直系同源物,而基于系统发育的方法则无法区分原始和次级直系同源物。然而,这两种同源性都在基因功能预测和系统发育重建等重要领域有应用,这就需要有能够区分这两种同源性的方法。
我们形式化了复制后分歧的概念,并为推断原始和次级直系同源物提供了理论基础。然后,我们使用 Hybrid Prediction of Paralogs and Orthologs (HyPPO) 框架将这些想法付诸实践,该框架结合了相似性和系统发育方法的思想。我们将我们的方法应用于模拟和实证数据集,并表明我们在预测原始直系同源物、次级直系同源物和旁系同源物方面取得了更高的准确性。
HyPPO 是一个模块化框架,核心部分是用 Python 开发的,并提供了各种 C++模块。源代码可在 https://github.com/manuellafond/HyPPO 获得。
补充数据可在 Bioinformatics 在线获取。