Jones Mark, Paul Christophe, Scornavacca Céline
LIRMM, CNRS, Université de Montpellier, Montpellier, France.
ISE-M, CNRS, IRD, EPHE, Université, Montpellier, France.
BMC Bioinformatics. 2016 Nov 11;17(Suppl 14):416. doi: 10.1186/s12859-016-1267-3.
Orthologs inference is the starting point of most comparative genomics studies, and a plethora of methods have been designed in the last decade to address this challenging task. In this paper we focus on the problems of deciding consistency with a species tree (known or not) of a partial set of orthology/paralogy relationships [Formula: see text] on a collection of n genes.
We give the first polynomial algorithm - more precisely a O(n ) time algorithm - to decide whether [Formula: see text] is consistent, even when the species tree is unknown. We also investigate a biologically meaningful optimization version of these problems, in which we wish to minimize the number of duplication events; unfortunately, we show that all these optimization problems are NP-hard and are unlikely to have good polynomial time approximation algorithms.
Our polynomial algorithm for checking consistency has been implemented in Python and is available at https://github.com/UdeM-LBIT/OrthoPara-ConstraintChecker .
直系同源物推断是大多数比较基因组学研究的起点,在过去十年中已经设计了大量方法来解决这一具有挑战性的任务。在本文中,我们关注在n个基因的集合上确定一组部分直系同源/旁系同源关系[公式:见正文]与物种树(已知或未知)一致性的问题。
我们给出了第一个多项式算法——更确切地说是一个O(n)时间算法——来判定[公式:见正文]是否一致,即使物种树未知。我们还研究了这些问题的一个具有生物学意义的优化版本,其中我们希望最小化复制事件的数量;不幸的是,我们表明所有这些优化问题都是NP难的,并且不太可能有良好的多项式时间近似算法。
我们用于检查一致性的多项式算法已用Python实现,可在https://github.com/UdeM-LBIT/OrthoPara-ConstraintChecker获取。