Wu Chuanyan, Gao Rui, De Marinis Yang, Zhang Yusen
School of Control Science and Engineering, Shandong University, Jinan 250061, China.
School of Control Science and Engineering, Shandong University, Jinan 250061, China.
J Theor Biol. 2018 Jun 7;446:61-70. doi: 10.1016/j.jtbi.2018.03.001. Epub 2018 Mar 7.
Advances in sequencing technologies led to rapid increase in the number and diversity of biological sequences, which facilitated development in the sequence research. In this paper, we present a new method for analyzing protein sequence similarity. We calculated the spectral radii of 20 amino acids (AAs) and put forward a novel 2-D graphical representation of protein sequences. To characterize protein sequences numerically, three groups of features were extracted and related to statistical, dynamics measurements and fluctuation complexity of the sequences. With the obtained feature vector, two models utilizing Gaussian Kernel similarity and Cosine similarity were built to measure the similarity between sequences. We applied our method to analyze the similarities/dissimilarities of four data sets. Both proposed models received consistent results with improvements when compared to that obtained by the ClustalW analysis. The novel approach we present in this study may therefore benefit protein research in medical and scientific fields.
测序技术的进步导致生物序列的数量和多样性迅速增加,这推动了序列研究的发展。在本文中,我们提出了一种分析蛋白质序列相似性的新方法。我们计算了20种氨基酸(AA)的谱半径,并提出了一种新颖的蛋白质序列二维图形表示法。为了从数值上表征蛋白质序列,提取了三组特征,并将其与序列的统计、动力学测量和波动复杂性相关联。利用获得的特征向量,建立了两个利用高斯核相似性和余弦相似性的模型来测量序列之间的相似性。我们应用我们的方法分析了四个数据集的相似性/差异性。与通过ClustalW分析获得的结果相比,两个提出的模型都得到了一致的结果且有所改进。因此,我们在本研究中提出的新方法可能有益于医学和科学领域的蛋白质研究。