Suppr超能文献

人类和啮齿动物假基因中插入和缺失的大小分布表明了序列比对的对数空位罚分。

The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment.

作者信息

Gu X, Li W H

机构信息

Human Genetics Center, SPH, University of Texas, Houston 77225, USA.

出版信息

J Mol Evol. 1995 Apr;40(4):464-73. doi: 10.1007/BF00164032.

Abstract

The size distributions of deletions, insertions, and indels (i.e., insertions or deletions) were studied, using 78 human processed pseudogenes and other published data sets. The following results were obtained: (1) Deletions occur more frequently than do insertions in sequence evolution; none of the pseudogenes studied shows significantly more insertions than deletions. (2) Empirically, the size distributions of deletions, insertions, and indels can be described well by a power law, i.e., fk = Ck-b, where fk is the frequency of deletion, insertion, or indel with gap length k, b is the power parameter, and C is the normalization factor. (3) The estimates of b for deletions and insertions from the same data set are approximately equal to each other, indicating that the size distributions for deletions and insertions are approximately identical. (4) The variation in the estimates of b among various data sets is small, indicating that the effect of local structure exists but only plays a secondary role in the size distribution of deletions and insertions. (5) The linear gap penalty, which is most commonly used in sequence alignment, is not supported by our analysis; rather, the power law for the size distribution of indels suggests that an appropriate gap penalty is wk = a + b ln k, where a is the gap creation cost and blnk is the gap extension cost. (6) The higher frequency of deletion over insertion suggests that the gap creation cost of insertion (ai) should be larger than that of deletion (ad); that is, ai - ad = ln R, where R is the frequency ratio of deletions to insertions.

摘要

利用78个人类加工假基因和其他已发表的数据集,研究了缺失、插入和插入缺失(即插入或缺失)的大小分布。得到了以下结果:(1)在序列进化中,缺失比插入更频繁发生;所研究的假基因中没有一个显示出明显更多的插入比缺失。(2)根据经验,缺失、插入和插入缺失的大小分布可以用幂律很好地描述,即fk = Ck-b,其中fk是间隙长度为k的缺失、插入或插入缺失的频率,b是幂参数,C是归一化因子。(3)来自同一数据集的缺失和插入的b估计值彼此大致相等,表明缺失和插入的大小分布大致相同。(4)不同数据集之间b估计值的变化很小,表明局部结构的影响存在,但在缺失和插入的大小分布中仅起次要作用。(5)我们的分析不支持序列比对中最常用的线性间隙罚分;相反,插入缺失大小分布的幂律表明合适的间隙罚分是wk = a + b ln k,其中a是间隙创建成本,blnk是间隙扩展成本。(6)缺失频率高于插入频率表明插入的间隙创建成本(ai)应大于缺失的间隙创建成本(ad);即,ai - ad = ln R,其中R是缺失与插入的频率比。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验