Suppr超能文献

内含子与它们所属的编码序列的组成性簇平行形成组成性簇。

Introns form compositional clusters in parallel with the compositional clusters of the coding sequences to which they pertain.

机构信息

Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, c/Nicolás Cabrera 1, 28049, Madrid, Spain.

出版信息

J Mol Evol. 2011 Jan;72(1):1-13. doi: 10.1007/s00239-010-9411-6. Epub 2010 Dec 4.

Abstract

This report deals with the study of compositional properties of human gene sequences evaluating similarities and differences among functionally distinct sectors of the gene independently of the reading frame. To retrieve the compositional information of DNA, we present a neighbor base dependent coding system in which the alphabet of 64 letters (DNA triplets) is compressed to an alphabet of 14 letters here termed triplet composons. The triplets containing the same set of distinct bases in whatever order and number form a triplet composon. The reading of the DNA sequence is performed starting at any letter of the initial triplet and then moving, triplet-to-triplet, until the end of the sequence. The readings were made in an overlapping way along the length of the sequences. The analysis of the compositional content in terms of the composon usage frequencies of the gene sequences shows that: (i) the compositional content of the sequences is far from that of random sequences, even in the case of non-protein coding sequences; (ii) coding sequences can be classified as components of compositional clusters; and (iii) intron sequences in a cluster have the same composon usage frequencies, even as their base composition differs notably from that of their home coding sequences. A comparison of the composon usage frequencies between human and mouse homologous genes indicated that two clusters found in humans do not have their counterpart in mouse whereas the others clusters are stable in both species with respect to their composon usage frequencies in both coding and noncoding sequences.

摘要

本报告研究了人类基因序列的组成特性,评估了基因功能不同区域之间的相似性和差异性,而不考虑阅读框。为了获取 DNA 的组成信息,我们提出了一种依赖于相邻碱基的编码系统,其中包含 64 个字母(DNA 三联体)的字母表被压缩到一个包含 14 个字母的字母表中,这里称为三联体组合子。包含相同碱基组合的三联体,无论其顺序和数量如何,都形成一个三联体组合子。DNA 序列的读取从初始三联体的任意一个字母开始,然后逐三联体移动,直到序列结束。读取沿着序列的长度以重叠的方式进行。根据基因序列的组合子使用频率对组成内容进行分析表明:(i)序列的组成内容远非随机序列,即使是非编码序列也是如此;(ii)编码序列可以被分类为组成聚类的组成部分;(iii)聚类中的内含子序列具有相同的组合子使用频率,即使它们的碱基组成与同源编码序列有明显差异。人类和小鼠同源基因的组合子使用频率比较表明,人类中发现的两个聚类在小鼠中没有对应物,而其他聚类在两种物种中都是稳定的,无论是在编码序列还是非编码序列中,其组合子使用频率都是如此。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验