使用非参数密度估计和稀疏性适应进行近天然蛋白质环采样。

Near-native protein loop sampling using nonparametric density estimation accommodating sparcity.

机构信息

Department of Chemistry, University of the Pacific, Stockton, California, United States of America.

出版信息

PLoS Comput Biol. 2011 Oct;7(10):e1002234. doi: 10.1371/journal.pcbi.1002234. Epub 2011 Oct 20.

DOI:10.1371/journal.pcbi.1002234

PMID:22028638

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3197639/

Abstract

Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD <2.0 Å), the DPM-HMM method performs as well or better than the best templates, demonstrating that our automated method recaptures these canonical loops without inclusion of any IgG specific terms or manual intervention. In cases with poor or few good templates (mean RMSD >7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/.

摘要

与蛋白质的规则二级结构等核心结构元素不同，基于模板的建模（TBM）在处理环区时会遇到困难，这是因为环区的序列和结构具有可变性，并且从有限数量的同源模板中进行稀疏采样。我们提出了一种新颖的基于知识的环区采样方法，该方法利用同源扭转角信息来估计每个环区位置的连续联合骨架二面角密度。通过隐马尔可夫模型（HMM）的狄利克雷过程混合（DPM-HMM）来估计φ、ψ分布。模型是根据这些分布的样本快速生成的，并使用端到端距离过滤器进行了丰富。通过在逐个目标的基础上进行交叉验证来评估 DPM-HMM 方法的性能。产生的候选结构最低 RMSD 低至 0.45 Å，最差情况为 3.66 Å。对于免疫球蛋白互补决定区（CDR）等典型环区（平均 RMSD <2.0 Å），DPM-HMM 方法的性能与最佳模板一样好或更好，这表明我们的自动化方法可以在不包含任何 IgG 特定术语或手动干预的情况下捕获这些典型环区。在模板质量差或数量少的情况下（平均 RMSD >7.0 Å），这种采样方法可以为长度不超过 17 个残基的环区产生一个结构群，其 RMSD 约为 3.66 Å。在 Loopy 算法的直接采样测试中，我们的方法展示了对典型 CDRH1 和非典型 CDRH3 环区进行更接近天然结构采样的能力。最后，在 CASP9 实验的实际测试条件下，DPM-HMM 成功应用于 45 个 TBM 目标的 90 个环区，表明我们的采样方法在环建模问题中具有普遍适用性。这些结果表明，我们的 DPM-HMM 通过一致地采样接近天然的环区结构来产生优势。该分析中使用的软件可在 http://www.stat.tamu.edu/~dahl/software/cortorgles/ 下载。