Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.
Proc Natl Acad Sci U S A. 2010 Dec 14;107(50):21476-81. doi: 10.1073/pnas.1012095107. Epub 2010 Nov 22.
Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.
并行序列和结构比对工具在生物系统研究的各个层面已经无处不在且非常重要。我们展示了相同的并行搜索范例在蛋白质结构确定过程中的应用和实用性,从大量且不断增长的已知结构中受益。这些搜索以前在计算上是不可行的。通过我们在这里开发的广泛搜索分子置换方法,可以在国家规模的联邦网络基础设施的帮助下在几个小时内完成。通过显著扩大结构确定中考虑的模型范围,我们表明,通过多维模板评分指标,可以识别出小(结构覆盖率小于 12%)和低序列同一性(小于 20%同一性)的模板结构,并将其用于结构确定。由于缺乏已知的同源蛋白折叠或序列,许多新的大分子复合物可以从这种技术中受益显著。我们通过确定来自 Trichoplusia ni 的全长 p97 同源物的结构来证明该方法的有效性。MHC/T 细胞受体复合物和 EmoB 蛋白的示例案例提供了成功实施该方法所需的最小序列同一性、结构覆盖率和结构相似性的系统估计。我们描述了如何通过与美国国家计算网络基础设施集成使这种结构搜索方法和其他新颖的计算密集型工作流程变得可行,例如,快速处理整个蛋白质结构分类数据库。