使用基准模拟比较 DNA 分割算法。

Comparative testing of DNA segmentation algorithms using benchmark simulations.

机构信息

Department of Biology & Biochemistry, University of Houston, TX, USA.

出版信息

Mol Biol Evol. 2010 May;27(5):1015-24. doi: 10.1093/molbev/msp307. Epub 2009 Dec 16.

PMID:20018981

Abstract

Numerous segmentation methods for the detection of compositionally homogeneous domains within genomic sequences have been proposed. Unfortunately, these methods yield inconsistent results. Here, we present a benchmark consisting of two sets of simulated genomic sequences for testing the performances of segmentation algorithms. Sequences in the first set are composed of fixed-sized homogeneous domains, distinct in their between-domain guanine and cytosine (GC) content variability. The sequences in the second set are composed of a mosaic of many short domains and a few long ones, distinguished by sharp GC content boundaries between neighboring domains. We use these sets to test the performance of seven segmentation algorithms in the literature. Our results show that recursive segmentation algorithms based on the Jensen-Shannon divergence outperform all other algorithms. However, even these algorithms perform poorly in certain instances because of the arbitrary choice of a segmentation-stopping criterion.

摘要

已经提出了许多用于检测基因组序列中成分均匀域的分割方法。不幸的是，这些方法的结果并不一致。在这里，我们提出了一个基准，包括两组模拟基因组序列，用于测试分割算法的性能。第一组序列由固定大小的均匀域组成，在它们的域间鸟嘌呤和胞嘧啶（GC）含量变化方面是不同的。第二组序列由许多短域和几个长域的镶嵌组成，通过相邻域之间的 GC 含量边界的急剧变化来区分。我们使用这两组序列来测试文献中七种分割算法的性能。我们的结果表明，基于 Jensen-Shannon 散度的递归分割算法优于所有其他算法。然而，即使是这些算法在某些情况下也表现不佳，因为分割停止标准的任意选择。

相似文献

Comparative testing of DNA segmentation algorithms using benchmark simulations.

Mol Biol Evol. 2010 May;27(5):1015-24. doi: 10.1093/molbev/msp307. Epub 2009 Dec 16.

Isochore structures in the chicken genome.

FEBS J. 2006 Apr;273(8):1637-48. doi: 10.1111/j.1742-4658.2006.05178.x.

Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm.

Nucleic Acids Res. 2010 Aug;38(15):e158. doi: 10.1093/nar/gkq532. Epub 2010 Jun 22.

Compositional searching of CpG islands in the human genome.

Phys Rev E Stat Nonlin Soft Matter Phys. 2005 Jun;71(6 Pt 1):061925. doi: 10.1103/PhysRevE.71.061925. Epub 2005 Jun 29.

A comparison study: applying segmentation to array CGH data for downstream analyses.

Bioinformatics. 2005 Nov 15;21(22):4084-91. doi: 10.1093/bioinformatics/bti677. Epub 2005 Sep 13.

Fast model-based protein homology detection without alignment.

Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

Efficient identification of DNA hybridization partners in a sequence database.

Bioinformatics. 2006 Jul 15;22(14):e350-8. doi: 10.1093/bioinformatics/btl240.

Numerical characterization of DNA sequences based on digital signal method.

Comput Biol Med. 2009 Apr;39(4):388-91. doi: 10.1016/j.compbiomed.2009.01.009. Epub 2009 Mar 3.

Discovering isochores by least-squares optimal segmentation.

Gene. 2007 Jun 1;394(1-2):53-60. doi: 10.1016/j.gene.2007.01.028. Epub 2007 Feb 16.

Segmentation algorithm for DNA sequences.

Phys Rev E Stat Nonlin Soft Matter Phys. 2005 Oct;72(4 Pt 1):041917. doi: 10.1103/PhysRevE.72.041917. Epub 2005 Oct 17.

引用本文的文献

Compositional Structure of the Genome: A Review.

Biology (Basel). 2023 Jun 13;12(6):849. doi: 10.3390/biology12060849.

Nanodosimetric Calculations of Radiation-Induced DNA Damage in a New Nucleus Geometrical Model Based on the Isochore Theory.

Int J Mol Sci. 2022 Mar 29;23(7):3770. doi: 10.3390/ijms23073770.

Extreme genome diversity in the hyper-prevalent parasitic eukaryote Blastocystis.

PLoS Biol. 2017 Sep 11;15(9):e2003769. doi: 10.1371/journal.pbio.2003769. eCollection 2017 Sep.

OcculterCut: A Comprehensive Survey of AT-Rich Regions in Fungal Genomes.

Genome Biol Evol. 2016 Jul 3;8(6):2044-64. doi: 10.1093/gbe/evw121.

Segmenting the Human Genome into Isochores.

Evol Bioinform Online. 2015 Nov 25;11:253-61. doi: 10.4137/EBO.S27693. eCollection 2015.

IsoPlotter(+): A Tool for Studying the Compositional Architecture of Genomes.

ISRN Bioinform. 2013 Apr 18;2013:725434. doi: 10.1155/2013/725434. eCollection 2013.

A comparative study and a phylogenetic exploration of the compositional architectures of mammalian nuclear genomes.

PLoS Comput Biol. 2014 Nov 6;10(11):e1003925. doi: 10.1371/journal.pcbi.1003925. eCollection 2014 Nov.

Investigating genomic structure using changept: A Bayesian segmentation model.

Comput Struct Biotechnol J. 2014 Aug 27;10(17):107-15. doi: 10.1016/j.csbj.2014.08.003. eCollection 2014 Jul.

Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm.

Nucleic Acids Res. 2010 Aug;38(15):e158. doi: 10.1093/nar/gkq532. Epub 2010 Jun 22.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用基准模拟比较 DNA 分割算法。

Comparative testing of DNA segmentation algorithms using benchmark simulations.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献