一种基于谱半径的蛋白质序列相似性分析新模型。

A novel model for protein sequence similarity analysis based on spectral radius.

作者信息

Wu Chuanyan, Gao Rui, De Marinis Yang, Zhang Yusen

机构信息

School of Control Science and Engineering, Shandong University, Jinan 250061, China.

出版信息

J Theor Biol. 2018 Jun 7;446:61-70. doi: 10.1016/j.jtbi.2018.03.001. Epub 2018 Mar 7.

DOI:10.1016/j.jtbi.2018.03.001

PMID:29524440

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7094169/

Abstract

Advances in sequencing technologies led to rapid increase in the number and diversity of biological sequences, which facilitated development in the sequence research. In this paper, we present a new method for analyzing protein sequence similarity. We calculated the spectral radii of 20 amino acids (AAs) and put forward a novel 2-D graphical representation of protein sequences. To characterize protein sequences numerically, three groups of features were extracted and related to statistical, dynamics measurements and fluctuation complexity of the sequences. With the obtained feature vector, two models utilizing Gaussian Kernel similarity and Cosine similarity were built to measure the similarity between sequences. We applied our method to analyze the similarities/dissimilarities of four data sets. Both proposed models received consistent results with improvements when compared to that obtained by the ClustalW analysis. The novel approach we present in this study may therefore benefit protein research in medical and scientific fields.

摘要

测序技术的进步导致生物序列的数量和多样性迅速增加，这推动了序列研究的发展。在本文中，我们提出了一种分析蛋白质序列相似性的新方法。我们计算了20种氨基酸（AA）的谱半径，并提出了一种新颖的蛋白质序列二维图形表示法。为了从数值上表征蛋白质序列，提取了三组特征，并将其与序列的统计、动力学测量和波动复杂性相关联。利用获得的特征向量，建立了两个利用高斯核相似性和余弦相似性的模型来测量序列之间的相似性。我们应用我们的方法分析了四个数据集的相似性/差异性。与通过ClustalW分析获得的结果相比，两个提出的模型都得到了一致的结果且有所改进。因此，我们在本研究中提出的新方法可能有益于医学和科学领域的蛋白质研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/78b3/7094169/c6996a24406f/fx1_lrg.jpg

相似文献

A novel model for protein sequence similarity analysis based on spectral radius.

J Theor Biol. 2018 Jun 7;446:61-70. doi: 10.1016/j.jtbi.2018.03.001. Epub 2018 Mar 7.

A Generalized Iterative Map for Analysis of Protein Sequences.

Comb Chem High Throughput Screen. 2022;25(3):381-391. doi: 10.2174/1386207323666201012142318.

A novel method to analyze the similarity of biological sequences.

J Biomol Struct Dyn. 2009 Apr;26(5):599-608. doi: 10.1080/07391102.2009.10507275.

Normalized feature vectors: a novel alignment-free sequence comparison method based on the numbers of adjacent amino acids.

IEEE/ACM Trans Comput Biol Bioinform. 2013 Mar-Apr;10(2):457-67. doi: 10.1109/TCBB.2013.10.

ADLD: a novel graphical representation of protein sequences and its application.

Comput Math Methods Med. 2014;2014:959753. doi: 10.1155/2014/959753. Epub 2014 Oct 30.

Detailed protein sequence alignment based on Spectral Similarity Score (SSS).

BMC Bioinformatics. 2005 Apr 23;6:105. doi: 10.1186/1471-2105-6-105.

Mapping sequence to feature vector using numerical representation of codons targeted to amino acids for alignment-free sequence analysis.

Gene. 2021 Jan 15;766:145096. doi: 10.1016/j.gene.2020.145096. Epub 2020 Sep 9.

A 3D graphical representation of protein sequences based on the Gray code.

J Theor Biol. 2012 Jul 7;304:81-7. doi: 10.1016/j.jtbi.2012.03.023. Epub 2012 Apr 1.

A new method to analyze the similarity of protein structure using TOPS representations.

J Biomol Struct Dyn. 2008 Dec;26(3):367-74. doi: 10.1080/07391102.2008.10507251.

A Fractal Dimension and Wavelet Transform Based Method for Protein Sequence Similarity Analysis.

IEEE/ACM Trans Comput Biol Bioinform. 2015 Mar-Apr;12(2):348-59. doi: 10.1109/TCBB.2014.2363480.

引用本文的文献

Use of 2D FFT and DTW in Protein Sequence Comparison.

Protein J. 2024 Feb;43(1):1-11. doi: 10.1007/s10930-023-10160-2. Epub 2023 Oct 17.

Classification of Protein Sequences by a Novel Alignment-Free Method on Bacterial and Virus Families.

Genes (Basel). 2022 Sep 27;13(10):1744. doi: 10.3390/genes13101744.

本文引用的文献

Protein Sequence Comparison Based on Physicochemical Properties and the Position-Feature Energy Matrix.

Sci Rep. 2017 Apr 10;7:46237. doi: 10.1038/srep46237.

A new method to analyze protein sequence similarity using Dynamic Time Warping.

Genomics. 2017 Mar;109(2):123-130. doi: 10.1016/j.ygeno.2016.12.002. Epub 2016 Dec 11.

A Novel Method for Alignment-free DNA Sequence Similarity Analysis Based on the Characterization of Complex Networks.

Evol Bioinform Online. 2016 Oct 6;12:229-235. doi: 10.4137/EBO.S40474. eCollection 2016.

ProtNN: fast and accurate protein 3D-structure classification in structural and topological space.

BioData Min. 2016 Sep 23;9:30. doi: 10.1186/s13040-016-0108-2. eCollection 2016.

Protein sequence analysis by incorporating modified chaos game and physicochemical properties into Chou's general pseudo amino acid composition.

J Theor Biol. 2016 Oct 7;406:105-15. doi: 10.1016/j.jtbi.2016.06.034. Epub 2016 Jun 29.

20D-dynamic representation of protein sequences.

Genomics. 2016 Jan;107(1):16-23. doi: 10.1016/j.ygeno.2015.12.003. Epub 2015 Dec 17.

Number of distinct sequence alignments with k-match and match sections.

Comput Biol Med. 2015 Aug;63:287-92. doi: 10.1016/j.compbiomed.2015.02.017. Epub 2015 Mar 6.

Novel numerical characterization of protein sequences based on individual amino acid and its application.

Biomed Res Int. 2015;2015:909567. doi: 10.1155/2015/909567. Epub 2015 Feb 2.

An efficient numerical method for protein sequences similarity analysis based on a new two-dimensional graphical representation.

SAR QSAR Environ Res. 2015;26(2):125-37. doi: 10.1080/1062936X.2014.995700.

ADLD: a novel graphical representation of protein sequences and its application.

Comput Math Methods Med. 2014;2014:959753. doi: 10.1155/2014/959753. Epub 2014 Oct 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种基于谱半径的蛋白质序列相似性分析新模型。

A novel model for protein sequence similarity analysis based on spectral radius.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献