DNA序列的相似性/相异性计算方法：综述

Similarity/dissimilarity calculation methods of DNA sequences: A survey.

作者信息

Jin Xin, Jiang Qian, Chen Yanyan, Lee Shin-Jye, Nie Rencan, Yao Shaowen, Zhou Dongming, He Kangjian

机构信息

School of Information, Yunnan University, Kunming, Yunnan Province, China.

School of Life Sciences, Yunnan University, Kunming, Yunnan Province, China.

出版信息

J Mol Graph Model. 2017 Sep;76:342-355. doi: 10.1016/j.jmgm.2017.07.019. Epub 2017 Jul 20.

DOI:10.1016/j.jmgm.2017.07.019

PMID:28763687

Abstract

DNA sequence similarity/dissimilarity analysis is a fundamental task in computational biology, which is used to analyze the similarity of different DNA sequences for learning their evolutionary relationships. In past decades, a large number of similarity analysis methods for DNA sequence have been proposed due to the ever-growing demands. In order to learn the advances of DNA sequence similarity analysis, we make a survey and try to promote the development of this field. In this paper, we first introduce the related knowledge of DNA similarities analysis, including the data sets, similarities distance and output data. Then, we review recent algorithmic developments for DNA similarity analysis to represent a survey of the art in this field. At last, we summarize the corresponding tendencies and challenges in this research field. This survey concludes that although various DNA similarity analysis methods have been proposed, there still exist several further improvements or potential research directions in this field.

摘要

DNA序列相似性/不相似性分析是计算生物学中的一项基本任务，用于分析不同DNA序列的相似性以了解它们的进化关系。在过去几十年中，由于需求不断增长，已经提出了大量用于DNA序列的相似性分析方法。为了了解DNA序列相似性分析的进展，我们进行了一项调查并试图推动该领域的发展。在本文中，我们首先介绍DNA相似性分析的相关知识，包括数据集、相似性距离和输出数据。然后，我们回顾了DNA相似性分析的最新算法发展，以呈现该领域的技术现状。最后，我们总结了该研究领域的相应趋势和挑战。这项调查得出的结论是，尽管已经提出了各种DNA相似性分析方法，但该领域仍存在一些进一步改进或潜在的研究方向。

相似文献

J Mol Graph Model. 2017 Sep;76:342-355. doi: 10.1016/j.jmgm.2017.07.019. Epub 2017 Jul 20.

Protein secondary structure prediction: A survey of the state of the art.

J Mol Graph Model. 2017 Sep;76:379-402. doi: 10.1016/j.jmgm.2017.07.015. Epub 2017 Jul 19.

ADLD: a novel graphical representation of protein sequences and its application.

Comput Math Methods Med. 2014;2014:959753. doi: 10.1155/2014/959753. Epub 2014 Oct 30.

Statistical measures of DNA sequence dissimilarity under Markov chain models of base composition.

Biometrics. 2001 Jun;57(2):441-8. doi: 10.1111/j.0006-341x.2001.00441.x.

A novel alignment-free DNA sequence similarity analysis approach based on top-k n-gram match-up.

J Mol Graph Model. 2020 Nov;100:107693. doi: 10.1016/j.jmgm.2020.107693. Epub 2020 Aug 7.

An improved model for whole genome phylogenetic analysis by Fourier transform.

J Theor Biol. 2015 Oct 7;382:99-110. doi: 10.1016/j.jtbi.2015.06.033. Epub 2015 Jul 4.

J Chem Inf Comput Sci. 2000 May-Jun;40(3):599-606. doi: 10.1021/ci9901082.

A measure of DNA sequence similarity by Fourier Transform with applications on hierarchical clustering.

J Theor Biol. 2014 Oct 21;359:18-28. doi: 10.1016/j.jtbi.2014.05.043. Epub 2014 Jun 6.

PNN-curve: a new 2D graphical representation of DNA sequences and its application.

J Theor Biol. 2006 Dec 21;243(4):555-61. doi: 10.1016/j.jtbi.2006.07.018. Epub 2006 Jul 24.

Vector representations and related matrices of DNA primary sequence based on L-tuple.

Math Biosci. 2010 Oct;227(2):147-52. doi: 10.1016/j.mbs.2010.07.004. Epub 2010 Aug 3.

引用本文的文献

NNKcat: deep neural network to predict catalytic constants (Kcat) by integrating protein sequence and substrate structure with enhanced data imbalance handling.

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf212.

Use of 2D FFT and DTW in Protein Sequence Comparison.

Protein J. 2024 Feb;43(1):1-11. doi: 10.1007/s10930-023-10160-2. Epub 2023 Oct 17.

Classification Maps: A New Mathematical Tool Supporting the Diagnosis of Age-Related Macular Degeneration.

J Pers Med. 2023 Jun 29;13(7):1074. doi: 10.3390/jpm13071074.

4D-Dynamic Representation of DNA/RNA Sequences: Studies on Genetic Diversity of in Red Foxes in Poland.

Life (Basel). 2022 Jun 10;12(6):877. doi: 10.3390/life12060877.

Non-standard bioinformatics characterization of SARS-CoV-2.

Comput Biol Med. 2021 Apr;131:104247. doi: 10.1016/j.compbiomed.2021.104247. Epub 2021 Feb 1.

Biomed Res Int. 2019 Nov 22;2019:2796971. doi: 10.1155/2019/2796971. eCollection 2019.

A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector.

Biomed Res Int. 2019 May 8;2019:8702968. doi: 10.1155/2019/8702968. eCollection 2019.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DNA序列的相似性/相异性计算方法：综述

Similarity/dissimilarity calculation methods of DNA sequences: A survey.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献