School of Science, Zhejiang Sci-Tech University, Hangzhou,China.
College of Life Science, Zhejiang Sci-Tech University, Hangzhou,China.
Comb Chem High Throughput Screen. 2022;25(3):381-391. doi: 10.2174/1386207323666201012142318.
The similarities comparison of biological sequences is an important task in bioinformatics. The methods of the similarities comparison for biological sequences are divided into two classes: sequence alignment method and alignment-free method. The graphical representation of biological sequences is a kind of alignment-free method, which constitutes a tool for analyzing and visualizing the biological sequences. In this article, a generalized iterative map of protein sequences was suggested to analyze the similarities of biological sequences.
Based on the normalized physicochemical indexes of 20 amino acids, each amino acid can be mapped into a point in 5D space. A generalized iterative function system was introduced to outline a generalized iterative map of protein sequences, which can not only reflect various physicochemical properties of amino acids but also incorporate with different compression ratios of the component of a generalized iterative map. Several properties were proved to illustrate the advantage of the generalized iterative map. The mathematical description of the generalized iterative map was suggested to compare the similarities and dissimilarities of protein sequences. Based on this method, similarities/dissimilarities were compared among ND5 protein sequences, as well as ND6 protein sequences of ten different species.
By correlation analysis, the ClustalW results were compared with our similarity/dissimilarity results and other graphical representation results to show the utility of our approach. The comparison results show that our approach has better correlations with ClustalW for all species than other approaches and illustrate the effectiveness of our approach.
Two examples show that our method not only has good performances and effects in the similarity/dissimilarity analysis of protein sequences but also does not require complex computation.
生物序列的相似性比较是生物信息学中的一项重要任务。生物序列相似性比较的方法分为两类:序列比对方法和无比对方法。生物序列的图形表示是一种无比对方法,它构成了分析和可视化生物序列的工具。在本文中,提出了一种蛋白质序列的广义迭代图来分析生物序列的相似性。
基于 20 种氨基酸的归一化物理化学指标,每个氨基酸都可以映射到 5D 空间中的一个点。引入了广义迭代函数系统来描绘蛋白质序列的广义迭代图,它不仅可以反映氨基酸的各种物理化学性质,还可以与广义迭代图的组成部分的不同压缩比相结合。证明了几个性质来说明广义迭代图的优势。提出了广义迭代图的数学描述来比较蛋白质序列的相似性和相异性。基于这种方法,比较了 ND5 蛋白序列和十种不同物种的 ND6 蛋白序列之间的相似性/相异性。
通过相关分析,将 ClustalW 的结果与我们的相似性/相异性结果以及其他图形表示结果进行了比较,以展示我们方法的实用性。比较结果表明,与其他方法相比,我们的方法与所有物种的 ClustalW 相关性更好,说明了我们方法的有效性。
两个示例表明,我们的方法不仅在蛋白质序列的相似性/相异性分析中具有良好的性能和效果,而且不需要复杂的计算。