Suppr超能文献

片段:识别DNA序列中的组成域。

SEGMENT: identifying compositional domains in DNA sequences.

作者信息

Oliver J L, Román-Roldán R, Pérez J, Bernaola-Galván P

机构信息

Department of Genetics, Faculty of Sciences, University of Granada, Spain.

出版信息

Bioinformatics. 1999 Dec;15(12):974-9. doi: 10.1093/bioinformatics/15.12.974.

Abstract

MOTIVATION

DNA sequences are formed by patches or domains of different nucleotide composition. In a few simple sequences, domains can simply be identified by eye; however, most DNA sequences show a complex compositional heterogeneity (fractal structure), which cannot be properly detected by current methods. Recently, a computationally efficient segmentation method to analyse such nonstationary sequence structures, based on the Jensen-Shannon entropic divergence, has been described. Specific algorithms implementing this method are now needed.

RESULTS

Here we describe a heuristic segmentation algorithm for DNA sequences, which was implemented on a Windows program (SEGMENT). The program divides a DNA sequence into compositionally homogeneous domains by iterating a local optimization procedure at a given statistical significance. Once a sequence is partitioned into domains, a global measure of sequence compositional complexity (SCC), accounting for both the sizes and compositional biases of all the domains in the sequence, is derived. SEGMENT computes SCC as a function of the significance level, which provides a multiscale view of sequence complexity.

摘要

动机

DNA序列由不同核苷酸组成的片段或结构域构成。在一些简单序列中,结构域可以直接通过肉眼识别;然而,大多数DNA序列呈现出复杂的组成异质性(分形结构),目前的方法无法对其进行恰当检测。最近,有人描述了一种基于 Jensen-Shannon 熵散度的计算效率高的分割方法,用于分析此类非平稳序列结构。现在需要实现该方法的具体算法。

结果

在此,我们描述了一种用于DNA序列的启发式分割算法,该算法在一个Windows程序(SEGMENT)中实现。该程序通过在给定的统计显著性水平下迭代局部优化过程,将DNA序列划分为组成均匀的结构域。一旦序列被划分为结构域,就可以得出一个序列组成复杂性(SCC)的全局度量,该度量考虑了序列中所有结构域的大小和组成偏差。SEGMENT将SCC计算为显著性水平的函数,从而提供序列复杂性的多尺度视图。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验