Dietmann S, Holm L
Structural Genomics Group, EMBL-EBI, Cambridge CB10 1SD, UK.
Nat Struct Biol. 2001 Nov;8(11):953-7. doi: 10.1038/nsb1101-953.
Structural biology and structural genomics are expected to produce many three-dimensional protein structures in the near future. Each new structure raises questions about its function and evolution. Correct functional and evolutionary classification of a new structure is difficult for distantly related proteins and error-prone using simple statistical scores based on sequence or structure similarity. Here we present an accurate numerical method for the identification of evolutionary relationships (homology). The method is based on the principle that natural selection maintains structural and functional continuity within a diverging protein family. The problem of different rates of structural divergence between different families is solved by first using structural similarities to produce a global map of folds in protein space and then further subdividing fold neighborhoods into superfamilies based on functional similarities. In a validation test against a classification by human experts (SCOP), 77% of homologous pairs were identified with 92% reliability. The method is fully automated, allowing fast, self-consistent and complete classification of large numbers of protein structures. In particular, the discrimination between analogy and homology of close structural neighbors will lead to functional predictions while avoiding overprediction.
预计结构生物学和结构基因组学在不久的将来会产生许多三维蛋白质结构。每一个新结构都会引发关于其功能和进化的问题。对于远亲蛋白质而言,对新结构进行正确的功能和进化分类很困难,而且使用基于序列或结构相似性的简单统计分数容易出错。在此,我们提出一种用于识别进化关系(同源性)的精确数值方法。该方法基于这样一个原则:自然选择在一个不断分化的蛋白质家族中维持结构和功能的连续性。通过首先利用结构相似性生成蛋白质空间中折叠的全局图谱,然后基于功能相似性将折叠邻域进一步细分为超家族,解决了不同家族之间结构分化速率不同的问题。在针对人类专家分类(SCOP)的验证测试中,77%的同源对被识别出来,可靠性达92%。该方法完全自动化,能够对大量蛋白质结构进行快速、自洽且完整的分类。特别是,对紧密结构邻域的相似性和同源性进行区分将有助于功能预测,同时避免过度预测。