Suppr超能文献

DNA序列的局部雷尼熵分布

Local Renyi entropic profiles of DNA sequences.

作者信息

Vinga Susana, Almeida Jonas S

机构信息

Instituto de Engenharia de Sistemas e Computadores: Investigação e Desenvolvimento (INESC-ID), R, Alves Redol 9, 1000-029 Lisboa, Portugal.

出版信息

BMC Bioinformatics. 2007 Oct 16;8:393. doi: 10.1186/1471-2105-8-393.

Abstract

BACKGROUND

In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs.

RESULTS

The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/.

CONCLUSION

The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.

摘要

背景

在最近的一份报告中,作者提出了一种用于DNA序列的连续熵新度量,它可以估计DNA序列的随机性水平。其中所探讨的定义基于使用帕曾窗方法的概率密度估计(pdf)的雷尼熵,并应用于混沌游戏表示/通用序列图(CGR/USM)。后续工作提出了一种分形pdf核,作为迭代映射表示的更精确解决方案。本报告通过使用新的pdf估计定义DNA序列熵轮廓来扩展连续熵的概念,以完善基序的密度估计。

结果

新方法产生了两个结果。一方面,它表明熵轮廓与基序的统计显著性直接相关,从而能够研究片段的代表性不足和过度代表性。另一方面,通过跨越核函数的参数,可以提取有关每个保守DNA区域规模的重要信息。用Matlab m代码开发的计算应用程序、相应的二进制可执行文件以及其他材料和示例可在http://kdbio.inesc-id.pt/~svinga/ep/上公开获取。

结论

从符号序列的尺度无关表示中检测局部保守性的能力在生物学应用中尤为重要,因为保守基序出现在多个重叠尺度上,在识别外来基因组材料和推断基序结构方面具有重要的未来应用价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d65/2238722/081af21f9069/1471-2105-8-393-1.jpg

相似文献

1
Local Renyi entropic profiles of DNA sequences.
BMC Bioinformatics. 2007 Oct 16;8:393. doi: 10.1186/1471-2105-8-393.
2
Entropic Profiler - detection of conservation in genomes using information theory.
BMC Res Notes. 2009 May 5;2:72. doi: 10.1186/1756-0500-2-72.
3
Rényi continuous entropy of DNA sequences.
J Theor Biol. 2004 Dec 7;231(3):377-88. doi: 10.1016/j.jtbi.2004.06.030.
4
Entropic profiles of DNA sequences through chaos-game-derived images.
J Theor Biol. 1993 Feb 21;160(4):457-70. doi: 10.1006/jtbi.1993.1030.
5
Universal sequence map (USM) of arbitrary discrete sequences.
BMC Bioinformatics. 2002;3:6. doi: 10.1186/1471-2105-3-6. Epub 2002 Feb 5.
6
Computing distribution of scale independent motifs in biological sequences.
Algorithms Mol Biol. 2006 Oct 18;1:18. doi: 10.1186/1748-7188-1-18.
7
Analysis of genomic sequences by Chaos Game Representation.
Bioinformatics. 2001 May;17(5):429-37. doi: 10.1093/bioinformatics/17.5.429.
8
Counting of oligomers in sequences generated by markov chains for DNA motif discovery.
J Bioinform Comput Biol. 2009 Feb;7(1):39-54. doi: 10.1142/s0219720009003935.
9
Generalization of entropy based divergence measures for symbolic sequence analysis.
PLoS One. 2014 Apr 11;9(4):e93532. doi: 10.1371/journal.pone.0093532. eCollection 2014.
10
Biological sequences as pictures: a generic two dimensional solution for iterated maps.
BMC Bioinformatics. 2009 Mar 31;10:100. doi: 10.1186/1471-2105-10-100.

引用本文的文献

1
AlcoR: alignment-free simulation, mapping, and visualization of low-complexity regions in biological data.
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad101. Epub 2023 Dec 13.
2
Spatial distribution of the Shannon entropy for mass spectrometry imaging.
PLoS One. 2023 Apr 6;18(4):e0283966. doi: 10.1371/journal.pone.0283966. eCollection 2023.
3
Towards More Efficient Rényi Entropy Estimation.
Entropy (Basel). 2023 Jan 17;25(2):185. doi: 10.3390/e25020185.
5
Spatial constrains and information content of sub-genomic regions of the human genome.
iScience. 2021 Jan 10;24(2):102048. doi: 10.1016/j.isci.2021.102048. eCollection 2021 Feb 19.
6
Integrated entropy-based approach for analyzing exons and introns in DNA sequences.
BMC Bioinformatics. 2019 Jun 10;20(Suppl 8):283. doi: 10.1186/s12859-019-2772-y.
7
Additive methods for genomic signatures.
BMC Bioinformatics. 2016 Aug 22;17(1):313. doi: 10.1186/s12859-016-1157-8.
8
Informational laws of genome structures.
Sci Rep. 2016 Jun 29;6:28840. doi: 10.1038/srep28840.
9
On the comparison of regulatory sequences with multiple resolution Entropic Profiles.
BMC Bioinformatics. 2016 Mar 18;17:130. doi: 10.1186/s12859-016-0980-2.
10
DNA sequences at a glance.
PLoS One. 2013 Nov 21;8(11):e79922. doi: 10.1371/journal.pone.0079922. eCollection 2013.

本文引用的文献

1
Validating the significance of genomic properties of Chi sites from the distribution of all octamers in Escherichia coli.
Gene. 2007 May 1;392(1-2):239-46. doi: 10.1016/j.gene.2006.12.022. Epub 2007 Jan 12.
2
How repetitive are genomes?
BMC Bioinformatics. 2006 Dec 22;7:541. doi: 10.1186/1471-2105-7-541.
3
Global variation in copy number in the human genome.
Nature. 2006 Nov 23;444(7118):444-54. doi: 10.1038/nature05329.
4
Computing distribution of scale independent motifs in biological sequences.
Algorithms Mol Biol. 2006 Oct 18;1:18. doi: 10.1186/1748-7188-1-18.
5
Repeats and correlations in human DNA sequences.
Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Jun;67(6 Pt 1):061913. doi: 10.1103/PhysRevE.67.061913. Epub 2003 Jun 26.
6
Evaluation of the current models for the evolution of bacterial DNA uptake signal sequences.
J Theor Biol. 2006 Jan 7;238(1):157-66. doi: 10.1016/j.jtbi.2005.05.024. Epub 2005 Jul 14.
7
Genome comparison without alignment using shortest unique substrings.
BMC Bioinformatics. 2005 May 23;6:123. doi: 10.1186/1471-2105-6-123.
8
The spectrum of genomic signatures: from dinucleotides to chaos game representation.
Gene. 2005 Feb 14;346:173-85. doi: 10.1016/j.gene.2004.10.021.
9
Detection and characterization of horizontal transfers in prokaryotes using genomic signature.
Nucleic Acids Res. 2005 Jan 13;33(1):e6. doi: 10.1093/nar/gni004.
10
Rényi continuous entropy of DNA sequences.
J Theor Biol. 2004 Dec 7;231(3):377-88. doi: 10.1016/j.jtbi.2004.06.030.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验