Gupta Kshitiz, Thomas Dina, Vidya S V, Venkatesh K V, Ramakumar S
Department of Computer Science & Engineering, Indian Institute of Technology, Bombay, Mumbai, India.
BMC Bioinformatics. 2005 Apr 23;6:105. doi: 10.1186/1471-2105-6-105.
The chemical property and biological function of a protein is a direct consequence of its primary structure. Several algorithms have been developed which determine alignment and similarity of primary protein sequences. However, character based similarity cannot provide insight into the structural aspects of a protein. We present a method based on spectral similarity to compare subsequences of amino acids that behave similarly but are not aligned well by considering amino acids as mere characters. This approach finds a similarity score between sequences based on any given attribute, like hydrophobicity of amino acids, on the basis of spectral information after partial conversion to the frequency domain.
Distance matrices of various branches of the human kinome, that is the full complement of human kinases, were developed that matched the phylogenetic tree of the human kinome establishing the efficacy of the global alignment of the algorithm. PKCd and PKCe kinases share close biological properties and structural similarities but do not give high scores with character based alignments. Detailed comparison established close similarities between subsequences that do not have any significant character identity. We compared their known 3D structures to establish that the algorithm is able to pick subsequences that are not considered similar by character based matching algorithms but share structural similarities. Similarly many subsequences with low character identity were picked between xyna-theau and xyna-clotm F/10 xylanases. Comparison of 3D structures of the subsequences confirmed the claim of similarity in structure.
An algorithm is developed which is inspired by successful application of spectral similarity applied to music sequences. The method captures subsequences that do not align by traditional character based alignment tools but give rise to similar secondary and tertiary structures. The Spectral Similarity Score (SSS) is an extension to the conventional similarity methods and results indicate that it holds a strong potential for analysis of various biological sequences and structural variations in proteins.
蛋白质的化学性质和生物学功能是其一级结构的直接结果。已经开发了几种算法来确定蛋白质一级序列的比对和相似性。然而,基于字符的相似性无法深入了解蛋白质的结构方面。我们提出了一种基于光谱相似性的方法,通过将氨基酸视为单纯的字符,来比较行为相似但比对效果不佳的氨基酸子序列。这种方法基于部分转换到频域后的光谱信息,根据任何给定属性(如氨基酸的疏水性)在序列之间找到相似性得分。
构建了人类激酶组各分支(即人类激酶的完整集合)的距离矩阵,该矩阵与人类激酶组的系统发育树相匹配,证明了该算法全局比对的有效性。PKCd和PKCe激酶具有密切的生物学特性和结构相似性,但基于字符的比对得分不高。详细比较发现,没有任何显著字符一致性的子序列之间存在密切相似性。我们比较了它们已知的三维结构,以确定该算法能够挑选出基于字符匹配算法认为不相似但具有结构相似性的子序列。同样,在xyna-theau和xyna-clotm F/10木聚糖酶之间也挑选出了许多字符一致性较低的子序列。子序列三维结构的比较证实了结构相似性的说法。
开发了一种受光谱相似性成功应用于音乐序列启发的算法。该方法能够捕捉到传统基于字符的比对工具无法比对但能产生相似二级和三级结构的子序列。光谱相似性得分(SSS)是对传统相似性方法的扩展,结果表明它在分析各种生物序列和蛋白质结构变异方面具有强大的潜力。