Suppr超能文献

隐藏在 DNA 中的古老重复序列:定位与定量。

Hidden ancient repeats in DNA: mapping and quantification.

机构信息

Department of Software Engineering, ORT Braude College, Karmiel, Israel.

出版信息

Gene. 2013 Oct 10;528(2):282-7. doi: 10.1016/j.gene.2013.06.059. Epub 2013 Jul 16.

Abstract

We have shown, in a previous paper, that tandem repeating sequences, especially triplet repeats, play a very important role in gene evolution. This result led to the formulation of the following hypothesis: most of the genomic sequences evolved through everlasting acts of tandem repeat expansions with subsequent accumulation of changes. In order to estimate how much of the observed sequences have the repeat origin we describe the adaptation of a text segmentation algorithm, based on dynamic programming, to the mapping of the ancient expansion events. The algorithm maximizes the segmentation cost, calculated as the similarity of obtained fragments to the putative repeat sequence. In the first application of the algorithm to segmentations of genomic sequences, a significant difference between the natural sequences and the corresponding shuffled sequences is detected. The natural fragments are longer and more similar to the putative repeat sequences. As our analysis shows, the coding sequences allow for repeats only when the size of the repeated words is divisible by three. In contrast, in the non-coding sequences, all repeated word sizes are present. It was estimated, that in Escherichia coli K12 genome, about 35.5% of sequence can be detectably traced to original simple repeat ancestors. The results shed light on the genomic sequence organization, and strongly confirm the hypothesis about the crucial role of triplet expansions in gene origin and evolution.

摘要

我们在之前的一篇论文中表明,串联重复序列,尤其是三核苷酸重复序列,在基因进化中起着非常重要的作用。这一结果导致了以下假说的提出:大多数基因组序列是通过不断的串联重复扩展以及随后的积累变化而进化的。为了估计观察到的序列中有多少具有重复起源,我们描述了一种文本分割算法的适应性,该算法基于动态规划,用于映射古代扩展事件。该算法将分割成本(计算为获得的片段与假定重复序列的相似性)最大化。在该算法首次应用于基因组序列的分割中,检测到自然序列和相应的随机化序列之间存在显著差异。自然片段更长,与假定的重复序列更相似。正如我们的分析所示,只有当重复单词的大小可以被三整除时,编码序列才允许存在重复。相比之下,在非编码序列中,所有重复单词的大小都存在。据估计,在大肠杆菌 K12 基因组中,约 35.5%的序列可以追溯到原始简单重复祖先。这些结果揭示了基因组序列的组织,并强烈证实了三核苷酸扩展在基因起源和进化中起着关键作用的假说。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验