Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Avenue, Ottawa, Canada K1N 6N5.
BMC Bioinformatics. 2012;13 Suppl 19(Suppl 19):S8. doi: 10.1186/1471-2105-13-S19-S8. Epub 2012 Dec 19.
It has recently been shown that fractionation, the random loss of excess gene copies after a whole genome duplication event, is a major cause of gene order disruption. When estimating evolutionary distances between genomes based on chromosomal rearrangement, fractionation inevitably leads to significant overestimation of classic rearrangement distances. This bias can be largely avoided when genomes are preprocessed by "consolidation", a procedure that identifies and accounts for regions of fractionation.
In this paper, we present a new consolidation algorithm that extends and improves previous work in several directions. We extend the notion of the fractionation region to use information provided by regions where this process is still ongoing. The new algorithm can optionally work with this new definition of fractionation region and is able to process not only tetraploids but also genomes that have undergone hexaploidization and polyploidization events of higher order. Finally, this algorithm reduces the asymptotic time complexity of consolidation from quadratic to linear dependence on the genome size. The new algorithm is applied both to plant genomes and to simulated data to study the effect of fractionation in ancient hexaploids.
最近的研究表明,在全基因组复制事件之后的随机基因拷贝丢失(即片段化)是导致基因顺序破坏的主要原因。在基于染色体重排估计基因组之间的进化距离时,片段化不可避免地导致经典重排距离的显著高估。当通过“整合”预处理基因组时,可以在很大程度上避免这种偏差,该过程可以识别和解释片段化区域。
在本文中,我们提出了一种新的整合算法,该算法在几个方面扩展和改进了以前的工作。我们将片段化区域的概念扩展到使用该过程仍在进行的区域提供的信息。新算法可以选择使用片段化区域的这个新定义,并且不仅可以处理四倍体,还可以处理已经经历了六倍体化和更高阶多倍体化事件的基因组。最后,该算法将整合的渐近时间复杂度从二次降低到对基因组大小的线性依赖。新算法应用于植物基因组和模拟数据,以研究古老六倍体中片段化的影响。