Morgenstern Burkhard, Schöbel Svenja, Leimeister Chris-André
Department of Bioinformatics, Institute of Microbiology and Genetics, University of Goettingen, Goldschmidtstr. 1, 37077 Göttingen, Germany.
Algorithms Mol Biol. 2017 Dec 11;12:27. doi: 10.1186/s13015-017-0118-8. eCollection 2017.
Various approaches to alignment-free sequence comparison are based on the length of exact or inexact word matches between pairs of input sequences. Haubold et al. (J Comput Biol 16:1487-1500, 2009) showed how the average number of substitutions per position between two DNA sequences can be estimated based on the average length of exact common substrings.
In this paper, we study the length distribution of -mismatch common substrings between two sequences. We show that the number of substitutions per position can be accurately estimated from the position of a local maximum in the length distribution of their -mismatch common substrings.
各种无比对序列比较方法基于输入序列对之间精确或不精确单词匹配的长度。豪博尔德等人(《计算生物学杂志》16:1487 - 1500,2009年)展示了如何基于精确公共子串的平均长度来估计两个DNA序列之间每个位置的平均替换数。
在本文中,我们研究了两个序列之间错配公共子串的长度分布。我们表明,每个位置的替换数可以从它们错配公共子串长度分布中局部最大值的位置准确估计出来。