对齐得分分布的形状来自哪里？

Where does the alignment score distribution shape come from?

机构信息

CNRS (UMR 6191)-CEA Cadarache-Aix-Marseille Université, Laboratoire d'Ecologie Microbienne de la Rhizosphere, Institut de Biologie Environementale et Biotechnologie, CEA Cadarache, F-13108 Saint Paul-lez-Durance, France.

出版信息

Evol Bioinform Online. 2010 Dec 12;6:159-87. doi: 10.4137/EBO.S5875.

DOI:10.4137/EBO.S5875

PMID:21258650

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3023300/

Abstract

Alignment algorithms are powerful tools for searching for homologous proteins in databases, providing a score for each sequence present in the database. It has been well known for 20 years that the shape of the score distribution looks like an extreme value distribution. The extremely large number of times biologists face this class of distributions raises the question of the evolutionary origin of this probability law.WE INVESTIGATED THE POSSIBILITY OF DERIVING THE MAIN PROPERTIES OF SEQUENCE ALIGNMENT SCORE DISTRIBUTIONS FROM A BASIC EVOLUTIONARY PROCESS: a duplication-divergence protein evolution process in a sequence space. Firstly, the distribution of sequences in this space was defined with respect to the genetic distance between sequences. Secondly, we derived a basic relation between the genetic distance and the alignment score. We obtained a novel score probability distribution which is qualitatively very similar to that of Karlin-Altschul but performing better than all other previous model.

摘要

对齐算法是在数据库中搜索同源蛋白质的强大工具，为数据库中存在的每个序列提供一个分数。二十年来，人们已经很清楚，分数分布的形状看起来像一个极值分布。生物学家经常遇到这类分布，这就提出了这个概率定律的进化起源问题。我们研究了从一个基本进化过程推导出序列对齐分数分布主要性质的可能性：序列空间中的复制-分歧蛋白进化过程。首先，根据序列之间的遗传距离定义了这个空间中的序列分布。其次，我们推导出遗传距离与对齐分数之间的基本关系。我们得到了一个新颖的分数概率分布，它在质量上与 Karlin-Altschul 的分布非常相似，但性能优于所有其他以前的模型。