Yu Yi-Kuo
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.
Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Jun;69(6 Pt 1):061904. doi: 10.1103/PhysRevE.69.061904. Epub 2004 Jun 1.
Sequence alignment is one of the most important bioinformatics tools for modern molecular biology. The statistical characterization of gapped alignment scores has been a long-standing problem in sequence alignment research. In this paper, we provide a self-contained exposition of sequence alignment, a short review about how this problem is related to the directed polymer problem in statistical physics, and some analytical results that can be used for predicting alignment score statistics. Basically, we present two classes of solutions for the gapped alignment statistics by explicitly calculating the evolution of the few-replica partition function in 1+1 dimensions. We have obtained the conditions under which the more important extremal parameter lambda, characterizing the alignment score statistics, becomes predictable.
序列比对是现代分子生物学中最重要的生物信息学工具之一。带空位比对分数的统计特征一直是序列比对研究中的一个长期问题。在本文中,我们对序列比对进行了完整的阐述,简要回顾了这个问题与统计物理学中的定向聚合物问题的关系,以及一些可用于预测比对分数统计的分析结果。基本上,我们通过明确计算1 + 1维中少数副本配分函数的演化,给出了两类带空位比对统计的解决方案。我们已经得到了表征比对分数统计的更重要的极值参数λ变得可预测的条件。