Liu Xiaoqing, Yang Xiaohua, Wang Cong, Yao Yuhua, Dai Qi
School of Science, Hangzhou Dianzi University, Hangzhou 310018, China.
Shangqiu Medical College, Shangqiu 476100, China.
Comput Biol Med. 2015 Aug;63:287-92. doi: 10.1016/j.compbiomed.2015.02.017. Epub 2015 Mar 6.
Recent developments in sequence alignment have led to significant advances in our understanding of the functional, structural or evolutionary relationships among biological sequences. Great efforts have been made to count the total number of sequence alignments, but little attention has been paid to specific alignments associated with conserved patterns.
We propose a new combinatorial method to count specific alignments. First, we represent a sequence alignment as a system of cells. Using combinatorial techniques and Stirling׳s formula, we then count the numbers of specific alignments with a k-match or match section of size k.
We developed three theorems related to different types of specific alignments. We found that the number of the alignments with match sections of at least k was less than that of k-match sections and the number of specific alignments was significantly lower than the results reported by Covington.
The presence of a large number of alignments makes a direct search for the optimal alignment unfeasible for long sequences, whereas our proposed method based on specific alignments decreases the search space by many times. This facilitates the development of a faster algorithm for performing sequence comparisons.
序列比对的最新进展使我们在理解生物序列之间的功能、结构或进化关系方面取得了重大进展。人们已付出巨大努力来计算序列比对的总数,但对于与保守模式相关的特定比对却很少关注。
我们提出一种新的组合方法来计算特定比对。首先,我们将序列比对表示为一个单元格系统。然后,利用组合技术和斯特林公式,我们计算具有k匹配或大小为k的匹配段的特定比对的数量。
我们得出了与不同类型特定比对相关的三个定理。我们发现,具有至少k个匹配段的比对数量少于k匹配段的数量,并且特定比对的数量明显低于科温顿报告的结果。
大量比对的存在使得直接搜索长序列的最佳比对变得不可行,而我们提出的基于特定比对的方法将搜索空间缩小了许多倍。这有助于开发一种更快的算法来进行序列比较。