Life Science Research Centre, Laboratory of Bioinformatics, 14 Gotua St, Tbilisi, 0160, Georgia.
J Biomol Struct Dyn. 2012;30(2):180-90. doi: 10.1080/07391102.2012.677769.
Sequence alignment is a standard method for the estimation of the evolutionary, structural, and functional relationships among amino acid sequences. The quality of alignments depends on the used similarity matrix. Statistical contact potentials (CPs) contain information on contact propensities among residues in native protein structures. Substitution matrices (SMs) based on CPs are applicable for the comparison of distantly related sequences. Here, contact between amino acids was estimated on the basis of the evaluation of the distances between side-chain terminal groups (SCTGs), which are defined as the group of the side-chain heavy atoms with fixed distances between them. In this paper, two new types of CPs and similarity matrices have been constructed: one based on fixed cutoff distance obtained from geometric characteristics of the SCTGs (TGC1), while the other is distance-dependent potential (TGC2). These matrices are compared with other popular SMs. The performance of the matrices was evaluated by comparing sequence with structural alignments. The obtained results show that TGC2 has the best performance among contact-based matrices, but on the whole, contact-based matrices have slightly lower performance than other SMs except fold-level similarity.
序列比对是估计氨基酸序列之间进化、结构和功能关系的标准方法。比对的质量取决于所使用的相似性矩阵。统计接触势(CPs)包含有关天然蛋白质结构中残基之间接触倾向的信息。基于 CPs 的取代矩阵(SMs)可用于比较远缘序列。在这里,氨基酸之间的接触是基于侧链末端基团(SCTGs)之间距离的评估来估计的,SCTGs 定义为具有固定距离的侧链重原子的基团。在本文中,构建了两种新型 CPs 和相似性矩阵:一种基于 SCTGs 的几何特征(TGC1)获得的固定截止距离,另一种是距离相关的势(TGC2)。将这些矩阵与其他流行的 SMs 进行了比较。通过比较结构比对和序列,评估了矩阵的性能。得到的结果表明,TGC2 在基于接触的矩阵中表现最好,但总体而言,基于接触的矩阵的性能略低于其他 SMs,除了折叠水平相似性。