Harvard-MIT Division of Health Sciences & Technology, and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America.
PLoS One. 2010 Feb 23;5(2):e9391. doi: 10.1371/journal.pone.0009391.
Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be "signature" of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1-2 angstroms (mean 1.61A) C(alpha) RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the 'twilight' and 'midnight' zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools.
大量不同的序列存在于大多数蛋白质折叠中。在识别属于同一折叠的蛋白质域中保守的特征的过程中,我们着手从折叠到折叠的角度检查整个蛋白质宇宙。我们报告说,尽管蛋白质序列存在显著的差异,但在溶剂未暴露的蛋白质域核心的原子相互作用网络在折叠上是保守的。此外,我们发现,这种被称为蛋白质核心原子相互作用网络(或 PCAIN)的特征在不同的折叠之间具有显著的可区分性,因此似乎是该结构域天然折叠的“特征”。作为这项研究的一部分,我们计算了来自已知的 1018 种蛋白质折叠中各个家族的 8698 个代表性蛋白质域的 PCAIN,以构建我们的种子数据库,并开发了一个自动化框架,用于基于 PCAIN 对蛋白质折叠宇宙进行特征描述。对来自种子数据库之外的随机选择的测试域进行分类,准确率超过 97%,与序列差异无关。作为这种新型折叠特征的应用,我们开发了一种基于 PCAIN 的评分方案,用于比较(同源)结构预测,计算结构与参考晶体结构之间通常存在 1-2 埃(平均 1.61A)Cα RMSD。我们的结果在整个测试域范围内都是一致的,包括最近的 CASP 实验的域,尤其是在“黄昏”和“午夜”区,其中目标-模板序列同一性<30%和<10%(平均黄昏 RMSD 为 1.69A)。我们进一步通过建模来自鼠疫致病细菌鼠疫耶尔森氏菌的 YopM 效应物新型 E3 连接酶(NEL)结构域,证明了 PCAIN 协议在推导蛋白质结构-功能关系的生物学见解方面的效用,并讨论了其对宿主适应性和先天免疫调节的影响由病原体。考虑到这项工作中展示的几个高通量、不依赖序列同一性的应用,我们认为 PCAIN 是一种基本的折叠特征,可能是蛋白质建模和分析工具库的有价值补充。