Hegyi H, Gerstein M
Department of Molecular Biophysics & Biochemistry Yale University, 266 Whitney Avenue, New Haven, CT 06520, USA.
J Mol Biol. 1999 Apr 23;288(1):147-64. doi: 10.1006/jmbi.1999.2661.
For most proteins in the genome databases, function is predicted via sequence comparison. In spite of the popularity of this approach, the extent to which it can be reliably applied is unknown. We address this issue by systematically investigating the relationship between protein function and structure. We focus initially on enzymes functionally classified by the Enzyme Commission (EC) and relate these to by structurally classified domains the SCOP database. We find that the major SCOP fold classes have different propensities to carry out certain broad categories of functions. For instance, alpha/beta folds are disproportionately associated with enzymes, especially transferases and hydrolases, and all-alpha and small folds with non-enzymes, while alpha+beta folds have an equal tendency either way. These observations for the database overall are largely true for specific genomes. We focus, in particular, on yeast, analyzing it with many classifications in addition to SCOP and EC (i.e. COGs, CATH, MIPS), and find clear tendencies for fold-function association, across a broad spectrum of functions. Analysis with the COGs scheme also suggests that the functions of the most ancient proteins are more evenly distributed among different structural classes than those of more modern ones. For the database overall, we identify the most versatile functions, i.e. those that are associated with the most folds, and the most versatile folds, associated with the most functions. The two most versatile enzymatic functions (hydro-lyases and O-glycosyl glucosidases) are associated with seven folds each. The five most versatile folds (TIM-barrel, Rossmann, ferredoxin, alpha-beta hydrolase, and P-loop NTP hydrolase) are all mixed alpha-beta structures. They stand out as generic scaffolds, accommodating from six to as many as 16 functions (for the exceptional TIM-barrel). At the conclusion of our analysis we are able to construct a graph giving the chance that a functional annotation can be reliably transferred at different degrees of sequence and structural similarity. Supplemental information is available from http://bioinfo.mbb.yale.edu/genome/foldfunc++ +.
对于基因组数据库中的大多数蛋白质,其功能是通过序列比较来预测的。尽管这种方法很流行,但它能够可靠应用的程度尚不清楚。我们通过系统地研究蛋白质功能与结构之间的关系来解决这个问题。我们最初关注由酶委员会(EC)进行功能分类的酶,并将这些酶与结构分类的结构域(SCOP数据库)相关联。我们发现,SCOP主要折叠类别在执行某些广泛功能类别方面具有不同的倾向。例如,α/β折叠与酶,特别是转移酶和水解酶不成比例地相关,而全α折叠和小折叠与非酶相关,而α + β折叠在两种情况下具有相同的倾向。对于整个数据库的这些观察结果在特定基因组中大体上也是正确的。我们特别关注酵母,除了SCOP和EC(即COGs、CATH、MIPS)之外,还用许多分类方法对其进行分析,并在广泛的功能范围内发现了折叠 - 功能关联的明显趋势。使用COGs方案进行的分析还表明,最古老蛋白质的功能在不同结构类别中的分布比更现代蛋白质的功能分布更均匀。对于整个数据库,我们确定了最通用的功能,即那些与最多折叠相关的功能,以及与最多功能相关的最通用的折叠。两种最通用的酶功能(水解酶和O - 糖基葡糖苷酶)各自与七种折叠相关。五种最通用的折叠(TIM桶、罗斯曼折叠、铁氧化还原蛋白、α - β水解酶和P - 环NTP水解酶)都是混合的α - β结构。它们作为通用支架脱颖而出,可容纳6到多达16种功能(对于特殊的TIM桶)。在我们的分析结束时,我们能够构建一个图表,给出在不同程度的序列和结构相似性下功能注释能够可靠转移的概率。补充信息可从http://bioinfo.mbb.yale.edu/genome/foldfunc++ +获取。