基因组中核苷酸的中等范围聚类。

Middle-range clustering of nucleotides in genomes.

作者信息

Mrázek J, Kypr J

机构信息

Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno.

出版信息

Comput Appl Biosci. 1995 Apr;11(2):195-9. doi: 10.1093/bioinformatics/11.2.195.

DOI:10.1093/bioinformatics/11.2.195

Abstract

We propose a novel, transparent and very simple algorithm to analyze middle-range correlations in genomic nucleotide sequences. Analysis by this algorithm of the EMBL Nucleotide Sequence Database demonstrates that all four nucleotides cluster in the genomic nucleotide sequences of eukaryotes on the scale of several hundred base pairs. In prokaryotes, the clustering is weak but still evident. The non-dominant three bases are deficient in the clusters, while A is the most deficient nucleotide in the clusters of C, and vice versa, and G is the most deficient nucleotide in the clusters of T, and vice versa. The algorithm also detects CG islands, extending over 1 kb, in vertebrate sequences. In plants, the CG islands are shown to be much smaller, if they exist at all. A clustering tendency is also exhibited by the TA doublet. Other doublets do not cluster. We observe no strong correlation between nucleotides separated in genomes by > 1 kb.

摘要

我们提出了一种新颖、透明且非常简单的算法，用于分析基因组核苷酸序列中的中程相关性。通过该算法对EMBL核苷酸序列数据库进行分析表明，在数百个碱基对的尺度上，真核生物基因组核苷酸序列中的所有四种核苷酸都会聚类。在原核生物中，聚类较弱但仍然明显。非优势的三个碱基在聚类中缺乏，而在C的聚类中A是最缺乏的核苷酸，反之亦然，在T的聚类中G是最缺乏的核苷酸，反之亦然。该算法还在脊椎动物序列中检测到延伸超过1kb的CG岛。在植物中，如果存在CG岛的话，它们要小得多。TA双峰也表现出聚类趋势。其他双峰不聚类。我们观察到在基因组中相隔>1kb的核苷酸之间没有强相关性。

相似文献

1

Middle-range clustering of nucleotides in genomes.基因组中核苷酸的中等范围聚类。

Comput Appl Biosci. 1995 Apr;11(2):195-9. doi: 10.1093/bioinformatics/11.2.195.

2

Statistical properties of nucleotide clusters in DNA sequences.DNA序列中核苷酸簇的统计特性。

J Zhejiang Univ Sci B. 2005 May;6(5):408-12. doi: 10.1631/jzus.2005.B0408.

3

Variations of the mononucleotide and short oligonucleotide distributions in the genomes of various organisms.各种生物体基因组中单核核苷酸和短寡核苷酸分布的变化。

J Theor Biol. 1999 Nov 21;201(2):141-56. doi: 10.1006/jtbi.1999.1019.

4

An improved model for whole genome phylogenetic analysis by Fourier transform.一种通过傅里叶变换进行全基因组系统发育分析的改进模型。

J Theor Biol. 2015 Oct 7;382:99-110. doi: 10.1016/j.jtbi.2015.06.033. Epub 2015 Jul 4.

5

A novel clustering method via nucleotide-based Fourier power spectrum analysis.一种基于核苷酸的傅里叶功率谱分析的新型聚类方法。

J Theor Biol. 2011 Jun 21;279(1):83-9. doi: 10.1016/j.jtbi.2011.03.029. Epub 2011 Apr 2.

6

A close relationship between primary nucleotides sequence structure and the composition of functional genes in the genome of prokaryotes.原核生物基因组中功能基因的组成与初级核苷酸序列结构密切相关。

Mol Phylogenet Evol. 2011 Dec;61(3):650-8. doi: 10.1016/j.ympev.2011.08.011. Epub 2011 Aug 16.

7

Scaling behavior of nucleotide cluster in DNA sequences.DNA序列中核苷酸簇的标度行为。

J Zhejiang Univ Sci B. 2007 May;8(5):359-64. doi: 10.1631/jzus.2007.B0359.

8

Selective intra-dinucleotide interactions and periodicities of bases separated by K sites: a new vision and tool for phylogeny analyses.由K个位点分隔的碱基的选择性二核苷酸内相互作用和周期性：系统发育分析的新视角和工具。

Biol Res. 2017 Feb 13;50(1):3. doi: 10.1186/s40659-017-0112-0.

9

Thermodynamic stability of base pairs between 2-hydroxyadenine and incoming nucleotides as a determinant of nucleotide incorporation specificity during replication.2-羟基腺嘌呤与进入的核苷酸之间碱基对的热力学稳定性作为复制过程中核苷酸掺入特异性的决定因素。

Nucleic Acids Res. 2001 Aug 15;29(16):3289-96. doi: 10.1093/nar/29.16.3289.

10

Revealing the Presence of a Symbolic Sequence Representing Multiple Nucleotides Based on K-Means Clustering of Oligonucleotides.基于寡核苷酸的 K-均值聚类揭示代表多个核苷酸的符号序列的存在。

Molecules. 2019 Jan 18;24(2):348. doi: 10.3390/molecules24020348.

引用本文的文献

1

Evolution of genomic sequence inhomogeneity at mid-range scales.中程尺度上基因组序列不均匀性的演化。

BMC Genomics. 2009 Nov 5;10:513. doi: 10.1186/1471-2164-10-513.

2

Mosaic structure of the DNA molecules of the human chromosomes 21 and 22.人类21号和22号染色体DNA分子的镶嵌结构。

Mol Biol Rep. 2001 Mar;28(1):9-17. doi: 10.1023/a:1011946803143.

3

Conformational properties of DNA strands containing guanine-adenine and thymine-adenine repeats.含有鸟嘌呤-腺嘌呤和胸腺嘧啶-腺嘌呤重复序列的DNA链的构象特性。

Nucleic Acids Res. 1998 Mar 15;26(6):1509-14. doi: 10.1093/nar/26.6.1509.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验