Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany.
G3 (Bethesda). 2012 Aug;2(8):883-9. doi: 10.1534/g3.112.002527. Epub 2012 Aug 1.
Comparative sequencing contributes critically to the functional annotation of genomes. One prerequisite for successful analysis of the increasingly abundant comparative sequencing data is the availability of efficient computational tools. We present here a strategy for comparing unaligned genomes based on a coalescent approach combined with advanced algorithms for indexing sequences. These algorithms are particularly efficient when analyzing large genomes, as their run time ideally grows only linearly with sequence length. Using this approach, we have derived and implemented a maximum-likelihood estimator of the average number of mismatches per site between two closely related sequences, π. By allowing for fluctuating coalescent times, we are able to improve a previously published alignment-free estimator of π. We show through simulation that our new estimator is fast and accurate even with moderate recombination (ρ ≤ π). To demonstrate its applicability to real data, we compare the unaligned genomes of Drosophila persimilis and D. pseudoobscura. In agreement with previous studies, our sliding window analysis locates the global divergence minimum between these two genomes to the pericentromeric region of chromosome 3.
比较测序对基因组的功能注释至关重要。成功分析日益丰富的比较测序数据的一个前提条件是拥有高效的计算工具。我们在此提出了一种基于合并方法并结合高级序列索引算法的未对齐基因组比较策略。当分析大型基因组时,这些算法特别有效,因为它们的运行时间理想情况下仅随序列长度呈线性增长。使用这种方法,我们推导出并实现了两个密切相关序列之间每个位置的平均错配数 π 的最大似然估计值。通过允许合并时间波动,我们能够改进以前发布的基于无比对的 π 估计值。通过模拟,我们表明,即使存在适度的重组(ρ ≤ π),我们的新估计值也快速且准确。为了证明它在实际数据中的适用性,我们比较了黑腹果蝇和拟暗果蝇的未对齐基因组。与先前的研究一致,我们的滑动窗口分析将这两个基因组之间的全局分歧最小值定位在染色体 3 的着丝粒区域。