基于图论的 DNA 序列相似性分析新模型。

A novel model for DNA sequence similarity analysis based on graph theory.

机构信息

School of Mathematics and Statistics, Shandong University at Weihai, Weihai, China, 264209.

出版信息

Evol Bioinform Online. 2011;7:149-58. doi: 10.4137/EBO.S7364. Epub 2011 Oct 4.

DOI:10.4137/EBO.S7364

PMID:22065497

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3204935/

Abstract

Determination of sequence similarity is one of the major steps in computational phylogenetic studies. As we know, during evolutionary history, not only DNA mutations for individual nucleotide but also subsequent rearrangements occurred. It has been one of major tasks of computational biologists to develop novel mathematical descriptors for similarity analysis such that various mutation phenomena information would be involved simultaneously. In this paper, different from traditional methods (eg, nucleotide frequency, geometric representations) as bases for construction of mathematical descriptors, we construct novel mathematical descriptors based on graph theory. In particular, for each DNA sequence, we will set up a weighted directed graph. The adjacency matrix of the directed graph will be used to induce a representative vector for DNA sequence. This new approach measures similarity based on both ordering and frequency of nucleotides so that much more information is involved. As an application, the method is tested on a set of 0.9-kb mtDNA sequences of twelve different primate species. All output phylogenetic trees with various distance estimations have the same topology, and are generally consistent with the reported results from early studies, which proves the new method's efficiency; we also test the new method on a simulated data set, which shows our new method performs better than traditional global alignment method when subsequent rearrangements happen frequently during evolutionary history.

摘要

序列相似性的确定是计算系统发育研究中的主要步骤之一。众所周知，在进化历史中，不仅发生了单个核苷酸的 DNA 突变，而且还发生了随后的重排。开发用于相似性分析的新的数学描述符一直是计算生物学家的主要任务之一，以便同时涉及各种突变现象信息。在本文中，我们与传统方法（例如核苷酸频率、几何表示）不同，将基于图论构建新的数学描述符。具体来说，对于每个 DNA 序列，我们将建立一个加权有向图。有向图的邻接矩阵将用于诱导 DNA 序列的代表向量。这种新方法基于核苷酸的排序和频率来衡量相似性，因此涉及更多信息。作为应用，该方法在一组来自 12 种不同灵长类动物的 0.9-kb mtDNA 序列上进行了测试。使用各种距离估计的所有输出系统发育树具有相同的拓扑结构，并且通常与早期研究报告的结果一致，这证明了新方法的有效性；我们还在模拟数据集上测试了新方法，当进化历史中频繁发生后续重排时，新方法的性能优于传统的全局比对方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/652c/3204935/8d800e42238c/ebo-7-2011-149f1.jpg

相似文献

A novel model for DNA sequence similarity analysis based on graph theory.

Evol Bioinform Online. 2011;7:149-58. doi: 10.4137/EBO.S7364. Epub 2011 Oct 4.

A Novel Method for Alignment-free DNA Sequence Similarity Analysis Based on the Characterization of Complex Networks.

Evol Bioinform Online. 2016 Oct 6;12:229-235. doi: 10.4137/EBO.S40474. eCollection 2016.

A new graph-theoretic approach to determine the similarity of genome sequences based on nucleotide triplets.

Genomics. 2020 Nov;112(6):4701-4714. doi: 10.1016/j.ygeno.2020.08.023. Epub 2020 Aug 19.

Graph-based analysis of DNA sequence comparison in closed cotton species: A generalized method to unveil genetic connections.

PLoS One. 2024 Sep 17;19(9):e0306608. doi: 10.1371/journal.pone.0306608. eCollection 2024.

Alignment-free sequence comparison using N-dimensional similarity space.

Curr Comput Aided Drug Des. 2010 Dec;6(4):290-6. doi: 10.2174/1573409911006040290.

An improved model for whole genome phylogenetic analysis by Fourier transform.

J Theor Biol. 2015 Oct 7;382:99-110. doi: 10.1016/j.jtbi.2015.06.033. Epub 2015 Jul 4.

J Mol Graph Model. 2020 Sep;99:107603. doi: 10.1016/j.jmgm.2020.107603. Epub 2020 May 3.

Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation.

Comb Chem High Throughput Screen. 2018;21(2):100-110. doi: 10.2174/1386207321666180130100838.

Multiple genome rearrangement: a general approach via the evolutionary genome graph.

Bioinformatics. 2002;18 Suppl 1:S303-11. doi: 10.1093/bioinformatics/18.suppl_1.s303.

Extension of molecular similarity analysis approach to classification of DNA sequences using DNA descriptors.

SAR QSAR Environ Res. 2011 Mar;22(1-2):21-34. doi: 10.1080/1062936X.2010.528255.

引用本文的文献

Tokenvizz: GraphRAG-Inspired Tokenization Tool for Genomic Data Discovery and Visualization.

bioRxiv. 2024 Dec 6:2024.12.03.626631. doi: 10.1101/2024.12.03.626631.

Determination of k-mer density in a DNA sequence and subsequent cluster formation algorithm based on the application of electronic filter.

Sci Rep. 2021 Jul 1;11(1):13701. doi: 10.1038/s41598-021-93154-3.

A new graph-theoretic approach to determine the similarity of genome sequences based on nucleotide triplets.

Genomics. 2020 Nov;112(6):4701-4714. doi: 10.1016/j.ygeno.2020.08.023. Epub 2020 Aug 19.

ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels.

BMC Genomics. 2019 Apr 3;20(1):267. doi: 10.1186/s12864-019-5571-y.

A novel model for protein sequence similarity analysis based on spectral radius.

J Theor Biol. 2018 Jun 7;446:61-70. doi: 10.1016/j.jtbi.2018.03.001. Epub 2018 Mar 7.

A Novel Method for Alignment-free DNA Sequence Similarity Analysis Based on the Characterization of Complex Networks.

Evol Bioinform Online. 2016 Oct 6;12:229-235. doi: 10.4137/EBO.S40474. eCollection 2016.

Bioinformatics studies of Influenza A hemagglutinin sequence data indicate recombination-like events leading to segment exchanges.

BMC Res Notes. 2016 Apr 15;9:222. doi: 10.1186/s13104-016-2017-3.

Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

PLoS One. 2014 Nov 13;9(11):e110954. doi: 10.1371/journal.pone.0110954. eCollection 2014.

A 2D graphical representation of the sequences of DNA based on triplets and its application.

EURASIP J Bioinform Syst Biol. 2014 Jan 2;2014(1):1. doi: 10.1186/1687-4153-2014-1.

Effective Encoding for DNA Sequence Visualization Based on Nucleotide's Ring Structure.

Evol Bioinform Online. 2013 Jul 7;9:251-61. doi: 10.4137/EBO.S12160. Print 2013.

本文引用的文献

TN curve: a novel 3D graphical representation of DNA sequence based on trinucleotides and its applications.

J Theor Biol. 2009 Dec 7;261(3):459-68. doi: 10.1016/j.jtbi.2009.08.005. Epub 2009 Aug 11.

New 3D graphical representation of DNA sequence based on dual nucleotides.

J Theor Biol. 2007 Dec 21;249(4):681-90. doi: 10.1016/j.jtbi.2007.08.025. Epub 2007 Sep 1.

Genomics, biogeography, and the diversification of placental mammals.

Proc Natl Acad Sci U S A. 2007 Sep 4;104(36):14395-400. doi: 10.1073/pnas.0704342104. Epub 2007 Aug 29.

PNN-curve: a new 2D graphical representation of DNA sequences and its application.

J Theor Biol. 2006 Dec 21;243(4):555-61. doi: 10.1016/j.jtbi.2006.07.018. Epub 2006 Jul 24.

J Chem Inf Comput Sci. 2004 Sep-Oct;44(5):1666-70. doi: 10.1021/ci034271f.

On 3-D graphical representation of DNA primary sequences and their numerical characterization.

J Chem Inf Comput Sci. 2000 Sep-Oct;40(5):1235-44. doi: 10.1021/ci000034q.

Condensed representation of DNA primary sequences.

J Chem Inf Comput Sci. 2000 Jan-Feb;40(1):50-6. doi: 10.1021/ci990084z.

Measuring genome evolution.

Proc Natl Acad Sci U S A. 1998 May 26;95(11):5849-56. doi: 10.1073/pnas.95.11.5849.

H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences.

J Biol Chem. 1983 Jan 25;258(2):1318-27.

A measure of the similarity of sets of sequences not requiring sequence alignment.

Proc Natl Acad Sci U S A. 1986 Jul;83(14):5155-9. doi: 10.1073/pnas.83.14.5155.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于图论的 DNA 序列相似性分析新模型。

A novel model for DNA sequence similarity analysis based on graph theory.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献