片段：识别DNA序列中的组成域。

SEGMENT: identifying compositional domains in DNA sequences.

作者信息

Oliver J L, Román-Roldán R, Pérez J, Bernaola-Galván P

机构信息

Department of Genetics, Faculty of Sciences, University of Granada, Spain.

出版信息

Bioinformatics. 1999 Dec;15(12):974-9. doi: 10.1093/bioinformatics/15.12.974.

DOI:10.1093/bioinformatics/15.12.974

PMID:10745986

Abstract

MOTIVATION

DNA sequences are formed by patches or domains of different nucleotide composition. In a few simple sequences, domains can simply be identified by eye; however, most DNA sequences show a complex compositional heterogeneity (fractal structure), which cannot be properly detected by current methods. Recently, a computationally efficient segmentation method to analyse such nonstationary sequence structures, based on the Jensen-Shannon entropic divergence, has been described. Specific algorithms implementing this method are now needed.

RESULTS

Here we describe a heuristic segmentation algorithm for DNA sequences, which was implemented on a Windows program (SEGMENT). The program divides a DNA sequence into compositionally homogeneous domains by iterating a local optimization procedure at a given statistical significance. Once a sequence is partitioned into domains, a global measure of sequence compositional complexity (SCC), accounting for both the sizes and compositional biases of all the domains in the sequence, is derived. SEGMENT computes SCC as a function of the significance level, which provides a multiscale view of sequence complexity.

摘要

动机

DNA序列由不同核苷酸组成的片段或结构域构成。在一些简单序列中，结构域可以直接通过肉眼识别；然而，大多数DNA序列呈现出复杂的组成异质性（分形结构），目前的方法无法对其进行恰当检测。最近，有人描述了一种基于 Jensen-Shannon 熵散度的计算效率高的分割方法，用于分析此类非平稳序列结构。现在需要实现该方法的具体算法。

结果

在此，我们描述了一种用于DNA序列的启发式分割算法，该算法在一个Windows程序（SEGMENT）中实现。该程序通过在给定的统计显著性水平下迭代局部优化过程，将DNA序列划分为组成均匀的结构域。一旦序列被划分为结构域，就可以得出一个序列组成复杂性（SCC）的全局度量，该度量考虑了序列中所有结构域的大小和组成偏差。SEGMENT将SCC计算为显著性水平的函数，从而提供序列复杂性的多尺度视图。

相似文献

SEGMENT: identifying compositional domains in DNA sequences.

Bioinformatics. 1999 Dec;15(12):974-9. doi: 10.1093/bioinformatics/15.12.974.

Segmentation algorithm for DNA sequences.

Phys Rev E Stat Nonlin Soft Matter Phys. 2005 Oct;72(4 Pt 1):041917. doi: 10.1103/PhysRevE.72.041917. Epub 2005 Oct 17.

Compositional searching of CpG islands in the human genome.

Phys Rev E Stat Nonlin Soft Matter Phys. 2005 Jun;71(6 Pt 1):061925. doi: 10.1103/PhysRevE.71.061925. Epub 2005 Jun 29.

Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm.

Nucleic Acids Res. 2010 Aug;38(15):e158. doi: 10.1093/nar/gkq532. Epub 2010 Jun 22.

An algorithm for identifying regions of a DNA sequence that satisfy a content requirement.

Comput Appl Biosci. 1994 Jun;10(3):219-25. doi: 10.1093/bioinformatics/10.3.219.

Comparative testing of DNA segmentation algorithms using benchmark simulations.

Mol Biol Evol. 2010 May;27(5):1015-24. doi: 10.1093/molbev/msp307. Epub 2009 Dec 16.

SeqVis: visualization of compositional heterogeneity in large alignments of nucleotides.

Bioinformatics. 2006 Sep 1;22(17):2162-3. doi: 10.1093/bioinformatics/btl283. Epub 2006 Jun 9.

Analysis of symbolic sequences using the Jensen-Shannon divergence.

Phys Rev E Stat Nonlin Soft Matter Phys. 2002 Apr;65(4 Pt 1):041905. doi: 10.1103/PhysRevE.65.041905. Epub 2002 Mar 25.

Markov models of genome segmentation.

Phys Rev E Stat Nonlin Soft Matter Phys. 2007 Jan;75(1 Pt 1):011915. doi: 10.1103/PhysRevE.75.011915. Epub 2007 Jan 17.

SeqVis: a tool for detecting compositional heterogeneity among aligned nucleotide sequences.

Methods Mol Biol. 2009;537:65-91. doi: 10.1007/978-1-59745-251-9_4.

引用本文的文献

Strong evidence for the evolution of decreasing compositional heterogeneity in SARS-CoV-2 genomes during the pandemic.

Sci Rep. 2025 Apr 10;15(1):12246. doi: 10.1038/s41598-025-95893-z.

Compositional Structure of the Genome: A Review.

Biology (Basel). 2023 Jun 13;12(6):849. doi: 10.3390/biology12060849.

Driven progressive evolution of genome sequence complexity in Cyanobacteria.

Sci Rep. 2020 Nov 4;10(1):19073. doi: 10.1038/s41598-020-76014-4.

NGSmethDB 2017: enhanced methylomes and differential methylation.

Nucleic Acids Res. 2017 Jan 4;45(D1):D97-D103. doi: 10.1093/nar/gkw996. Epub 2016 Oct 27.

Effects of coarse-graining on the scaling behavior of long-range correlated and anti-correlated signals.

Physica A. 2011 Nov 1;390(23-24):4057-4072. doi: 10.1016/j.physa.2011.05.015.

Investigating genomic structure using changept: A Bayesian segmentation model.

Comput Struct Biotechnol J. 2014 Aug 27;10(17):107-15. doi: 10.1016/j.csbj.2014.08.003. eCollection 2014 Jul.

Interpreting genomic data via entropic dissection.

Nucleic Acids Res. 2013 Jan 7;41(1):e23. doi: 10.1093/nar/gks917. Epub 2012 Oct 3.

Comparing segmentations by applying randomization techniques.

BMC Bioinformatics. 2007 May 23;8:171. doi: 10.1186/1471-2105-8-171.

Genomics, morphogenesis and biophysics: triangulation of Purkinje cell development.

Cerebellum. 2006;5(1):27-35. doi: 10.1080/14734220500378581.

Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models.

Nucleic Acids Res. 2002 Mar 15;30(6):1418-26. doi: 10.1093/nar/30.6.1418.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

片段：识别DNA序列中的组成域。

SEGMENT: identifying compositional domains in DNA sequences.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献