Suppr超能文献

使用Z曲线法分析天蓝色链霉菌A3(2)基因组中的核苷酸分布。

Analysis of nucleotide distribution in the genome of Streptomyces coelicolor A3(2) using the Z curve method.

作者信息

Ou Hong-Yu, Guo Feng-Biao, Zhang Chun-Ting

机构信息

Department of Physics, Tianjin University, Tianjin 300072, PR China.

出版信息

FEBS Lett. 2003 Apr 10;540(1-3):188-94. doi: 10.1016/s0014-5793(03)00263-1.

Abstract

The nucleotide distribution of all 33,527 open reading frames (ORFs) (> or =300 bp) in the genome of Streptomyces coelicolor A3(2) has been analyzed using the Z curve method. Each ORF is mapped onto a point in a 9-dimensional space. To visualize the distribution of mapping points, the points are projected onto the principal plane based on principal component analysis. Consequently, the distribution pattern of the 33,527 points in the principal plane shows a flower-like shape, in which there are seven distinct regions. In addition to the central region, there are six petal-like regions around the center, one of which corresponds to 7172 coding sequences. The central region and the remaining five petal-like regions correspond to the intergenic sequences and out-of-frame non-coding ORFs, respectively. It is shown that selective pressure produces a remarkable bias of the G+C content among three codon positions, resulting in the interesting phenomenon observed. A similar phenomenon is also observed for other bacterial genomes with high genomic G+C content, such as Pseudomonas aeruginosa PA01 (G+C = 66.6%). However, for the genomes of Bacillus subtilis (G+C = 43.5%) and Clostridium perfringens (G+C = 28.6%), no similar phenomenon was observed. The finding presented here may be useful to improve the gene-finding algorithms for genomes with high G+C content. A set of supplementary materials including the plots displaying the base distribution patterns of ORFs in 12 prokaryotes is provided on the website http://tubic.tju.edu.cn/highGC/.

摘要

利用Z曲线方法分析了天蓝色链霉菌A3(2)基因组中所有33527个开放阅读框(ORF,≥300 bp)的核苷酸分布。每个ORF被映射到一个九维空间中的点上。为了可视化映射点的分布,基于主成分分析将这些点投影到主平面上。结果,主平面上33527个点的分布模式呈现出花朵状,其中有七个不同的区域。除了中心区域外,中心周围还有六个花瓣状区域,其中一个区域对应7172个编码序列。中心区域和其余五个花瓣状区域分别对应基因间序列和移码非编码ORF。结果表明,选择压力在三个密码子位置间产生了显著的G+C含量偏差,从而导致了所观察到的有趣现象。在其他具有高基因组G+C含量的细菌基因组中也观察到了类似现象,如铜绿假单胞菌PA01(G+C = 66.6%)。然而,在枯草芽孢杆菌(G+C = 43.5%)和产气荚膜梭菌(G+C = 28.6%)的基因组中未观察到类似现象。此处呈现的发现可能有助于改进针对高G+C含量基因组的基因预测算法。网站http://tubic.tju.edu.cn/highGC/上提供了一组补充材料,包括展示12种原核生物中ORF碱基分布模式的图表。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验