不同超家族中功能亚类与活性位点和配体结合残基中所包含信息之间的关系。

Relationships between functional subclasses and information contained in active-site and ligand-binding residues in diverse superfamilies.

机构信息

National Institute of Biomedical Innovation, 7-6-8 Saito-Asagi, Ibaraki, Osaka 567-0085, Japan.

出版信息

Proteins. 2010 Aug 1;78(10):2369-84. doi: 10.1002/prot.22750.

PMID:20544971

Abstract

To investigate the relationships between functional subclasses and sequence and structural information contained in the active-site and ligand-binding residues (LBRs), we performed a detailed analysis of seven diverse enzyme superfamilies: aldolase class I, TIM-barrel glycosidases, alpha/beta-hydrolases, P-loop containing nucleotide triphosphate hydrolases, collagenase, Zn peptidases, and glutamine phosphoribosylpyrophosphate, subunit 1, domain 1. These homologous superfamilies, as defined in CATH, were selected from the enzyme catalytic-mechanism database. We defined active-site and LBRs based solely on the literature information and complex structures in the Protein Data Bank. From a structure-based multiple sequence alignment for each CATH homologous superfamily, we extracted subsequences consisting of the aligned positions that were used as an active-site or a ligand-binding site by at least one sequence. Using both the subsequences and full-length alignments, we performed cluster analysis with three sequence distance measures. We showed that the cluster analysis using the subsequences was able to detect functional subclasses more accurately than the clustering using the full-length alignments. The subsequences determined by only the literature information and complex structures, thus, had sufficient information to detect the functional subclasses. Detailed examination of the clustering results provided new insights into the mechanism of functional diversification for these superfamilies.

摘要

为了研究功能亚类与活性位点和配体结合残基（LBR）中包含的序列和结构信息之间的关系，我们对七个不同的酶超家族进行了详细分析：醛缩酶 I 类、TIM 桶糖苷酶、α/β-水解酶、P 环含核苷酸三磷酸水解酶、胶原酶、Zn 肽酶和谷氨酰胺磷酸核糖基焦磷酸酰胺 1 亚基，域 1。这些同源超家族是根据 CATH 中的酶催化机制数据库定义的。我们仅根据文献信息和蛋白质数据库中的复合物结构来定义活性位点和 LBR。对于每个 CATH 同源超家族的基于结构的多重序列比对，我们提取了由至少一个序列用作活性位点或配体结合位点的对齐位置组成的子序列。我们使用三种序列距离度量标准对亚序列和全长比对进行了聚类分析。结果表明，与使用全长比对进行聚类相比，使用子序列进行聚类分析能够更准确地检测功能亚类。仅使用文献信息和复合物结构确定的子序列具有足够的信息来检测功能亚类。对聚类结果的详细检查为这些超家族的功能多样化机制提供了新的见解。