Haubold Bernhard
Corresponding author. Bernhard Haubold.
Brief Bioinform. 2014 May;15(3):407-18. doi: 10.1093/bib/bbt083. Epub 2013 Nov 29.
Phylogenetics and population genetics are central disciplines in evolutionary biology. Both are based on comparative data, today usually DNA sequences. These have become so plentiful that alignment-free sequence comparison is of growing importance in the race between scientists and sequencing machines. In phylogenetics, efficient distance computation is the major contribution of alignment-free methods. A distance measure should reflect the number of substitutions per site, which underlies classical alignment-based phylogeny reconstruction. Alignment-free distance measures are either based on word counts or on match lengths, and I apply examples of both approaches to simulated and real data to assess their accuracy and efficiency. While phylogeny reconstruction is based on the number of substitutions, in population genetics, the distribution of mutations along a sequence is also considered. This distribution can be explored by match lengths, thus opening the prospect of alignment-free population genomics.
系统发育学和群体遗传学是进化生物学的核心学科。两者都基于比较数据,如今通常是DNA序列。这些数据变得如此丰富,以至于在科学家与测序机器的竞赛中,无比对序列比较变得越来越重要。在系统发育学中,高效的距离计算是无比对方法的主要贡献。距离度量应反映每个位点的替换数,这是基于比对的经典系统发育重建的基础。无比对距离度量要么基于词计数,要么基于匹配长度,我将这两种方法的示例应用于模拟数据和实际数据,以评估它们的准确性和效率。虽然系统发育重建基于替换数,但在群体遗传学中,也会考虑突变沿序列的分布。这种分布可以通过匹配长度来探索,从而开启了无比对群体基因组学的前景。