Geourjon C, Combet C, Blanchet C, Deléage G
Pôle BioInformatique Lyonnais, Institut de Biologie et Chimie des Protéines, Centre National de la Recherche Scientifique, UMR 5086, 69 367 Lyon CEDEX 07, France.
Protein Sci. 2001 Apr;10(4):788-97. doi: 10.1110/ps.30001.
Molecular modeling of proteins is confronted with the problem of finding homologous proteins, especially when few identities remain after the process of molecular evolution. Using even the most recent methods based on sequence identity detection, structural relationships are still difficult to establish with high reliability. As protein structures are more conserved than sequences, we investigated the possibility of using protein secondary structure comparison (observed or predicted structures) to discriminate between related and unrelated proteins sequences in the range of 10%-30% sequence identity. Pairwise comparison of secondary structures have been measured using the structural overlap (Sov) parameter. In this article, we show that if the secondary structures likeness is >50%, most of the pairs are structurally related. Taking into account the secondary structures of proteins that have been detected by BLAST, FASTA, or SSEARCH in the noisy region (with high E: value), we show that distantly related protein sequences (even with <20% identity) can be still identified. This strategy can be used to identify three-dimensional templates in homology modeling by finding unexpected related proteins and to select proteins for experimental investigation in a structural genomic approach, as well as for genome annotation.
蛋白质的分子建模面临着寻找同源蛋白质的问题,尤其是在分子进化过程后仅剩下很少的相同序列时。即使使用基于序列同一性检测的最新方法,结构关系仍然难以高度可靠地建立。由于蛋白质结构比序列更保守,我们研究了使用蛋白质二级结构比较(观察到的或预测的结构)来区分序列同一性在10%至30%范围内的相关和不相关蛋白质序列的可能性。二级结构的成对比较已使用结构重叠(Sov)参数进行测量。在本文中,我们表明,如果二级结构相似度>50%,大多数对在结构上是相关的。考虑到通过BLAST、FASTA或SSEARCH在噪声区域(具有高E值)中检测到的蛋白质的二级结构,我们表明远缘相关的蛋白质序列(即使同一性<20%)仍然可以被识别。该策略可用于通过找到意外的相关蛋白质来识别同源建模中的三维模板,以及在结构基因组学方法中选择用于实验研究的蛋白质,以及用于基因组注释。