基于层次狄利克雷过程模型开发的依赖于邻居的氨基酸拉马钱德兰概率分布。

Neighbor-dependent Ramachandran probability distributions of amino acids developed from a hierarchical Dirichlet process model.

机构信息

Department of Statistics, University of California Berkeley, Berkeley, California, United States of America.

出版信息

PLoS Comput Biol. 2010 Apr 29;6(4):e1000763. doi: 10.1371/journal.pcbi.1000763.

DOI:10.1371/journal.pcbi.1000763

PMID:20442867

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2861699/

Abstract

Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp.

摘要

蛋白质骨架二面角的分布已经被研究了超过 40 年。虽然已经提出了许多统计分析方法，但只有少数几种概率密度可供用于结构验证和结构预测方法。可用的分布在许多重要方面存在差异，这决定了它们在各种用途中的有用性。这些差异包括：1）输入数据的大小和结构包含的标准（分辨率、R 因子等）；2）使用 B 因子或其他特征过滤可疑构象和离群值；3）输入数据的二级结构（例如，是否包含螺旋和片层；是否包含β转角）；4）用于确定概率密度的方法，范围从简单的直方图到现代的非参数密度估计；5）它们是否包括在 Ramachandran 图谱的不同区域中构象分布的最近邻效应。在这项工作中，根据计算出的电子密度，从高分辨率数据集过滤后，为蛋白质环中的残基提供了 Ramachandran 概率分布。已经确定了所有 20 种氨基酸（顺式和反式脯氨酸分别处理）的分布，以及 420 个左邻和 420 个右邻依赖的分布。使用基于 Dirichlet 过程的贝叶斯非参数统计分析准确地估计了邻居独立和邻居依赖的概率密度。特别是，我们使用了层次 Dirichlet 过程先验，允许在特定残基类型和不同邻位残基类型的密度之间共享信息。所得分布在 Rosetta 程序的环建模基准测试中进行了测试，并证明可显著改善蛋白质环构象预测。这些分布可在 http://dunbrack.fccc.edu/hdp 上获得。

相似文献

Neighbor-dependent Ramachandran probability distributions of amino acids developed from a hierarchical Dirichlet process model.

PLoS Comput Biol. 2010 Apr 29;6(4):e1000763. doi: 10.1371/journal.pcbi.1000763.

A new clustering and nomenclature for beta turns derived from high-resolution protein structures.

PLoS Comput Biol. 2019 Mar 7;15(3):e1006844. doi: 10.1371/journal.pcbi.1006844. eCollection 2019 Mar.

Using Dirichlet mixture priors to derive hidden Markov models for protein families.

Proc Int Conf Intell Syst Mol Biol. 1993;1:47-55.

A Bayesian-probability-based method for assigning protein backbone dihedral angles based on chemical shifts and local sequences.

J Biomol NMR. 2007 Jan;37(1):31-41. doi: 10.1007/s10858-006-9097-7. Epub 2006 Dec 7.

Predicting dihedral angle probability distributions for protein coil residues from primary sequence using neural networks.

BMC Bioinformatics. 2009 Oct 16;10:338. doi: 10.1186/1471-2105-10-338.

On residues in the disallowed region of the Ramachandran map.

Biopolymers. 2002 Mar;63(3):195-206. doi: 10.1002/bip.10051.

Assessing side-chain perturbations of the protein backbone: a knowledge-based classification of residue Ramachandran space.

J Mol Biol. 2008 May 2;378(3):749-58. doi: 10.1016/j.jmb.2008.02.043. Epub 2008 Feb 29.

Construction and comparison of the statistical coil states of unfolded and intrinsically disordered proteins from nearest-neighbor corrected conformational propensities of short peptides.

Mol Biosyst. 2016 Oct 18;12(11):3294-3306. doi: 10.1039/c6mb00489j.

Nearest-neighbor effects on backbone alpha and beta carbon chemical shifts in proteins.

J Biomol NMR. 2007 Nov;39(3):247-57. doi: 10.1007/s10858-007-9193-3.

Randomizing of Oligopeptide Conformations by Nearest Neighbor Interactions between Amino Acid Residues.

Biomolecules. 2022 May 11;12(5):684. doi: 10.3390/biom12050684.

引用本文的文献

Amber ff24EXP-GA, Based on Empirical Ramachandran Distributions of Glycine and Alanine Residues in Water.

J Chem Theory Comput. 2025 Mar 11;21(5):2515-2534. doi: 10.1021/acs.jctc.4c01450. Epub 2025 Feb 20.

How hydrophobicity, side chains, and salt affect the dimensions of disordered proteins.

Protein Sci. 2024 May;33(5):e4986. doi: 10.1002/pro.4986.

IDPConformerGenerator: A Flexible Software Suite for Sampling the Conformational Space of Disordered Protein States.

J Phys Chem A. 2022 Sep 8;126(35):5985-6003. doi: 10.1021/acs.jpca.2c03726. Epub 2022 Aug 28.

Exploring Nearest Neighbor Interactions and Their Influence on the Gibbs Energy Landscape of Unfolded Proteins and Peptides.

Int J Mol Sci. 2022 May 18;23(10):5643. doi: 10.3390/ijms23105643.

Randomizing of Oligopeptide Conformations by Nearest Neighbor Interactions between Amino Acid Residues.

Biomolecules. 2022 May 11;12(5):684. doi: 10.3390/biom12050684.

Structural Prediction of Peptide-MHC Binding Modes.

Methods Mol Biol. 2022;2405:245-282. doi: 10.1007/978-1-0716-1855-4_13.

Quantitative Assessment of Chirality of Protein Secondary Structures and Phenylalanine Peptide Nanotubes.

Nanomaterials (Basel). 2021 Dec 5;11(12):3299. doi: 10.3390/nano11123299.

Current Approaches in Supersecondary Structures Investigation.

Int J Mol Sci. 2021 Nov 2;22(21):11879. doi: 10.3390/ijms222111879.

Accurate prediction of protein torsion angles using evolutionary signatures and recurrent neural network.

Sci Rep. 2021 Oct 26;11(1):21033. doi: 10.1038/s41598-021-00477-2.

DIPEND: An Open-Source Pipeline to Generate Ensembles of Disordered Segments Using Neighbor-Dependent Backbone Preferences.

Biomolecules. 2021 Oct 12;11(10):1505. doi: 10.3390/biom11101505.

本文引用的文献

Density Estimation for Protein Conformation Angles Using a Bivariate von Mises Distribution and Bayesian Nonparametrics.

J Am Stat Assoc. 2009 Jun 1;104(486):586-596. doi: 10.1198/jasa.2009.0024.

Conformation dependence of backbone geometry in proteins.

Structure. 2009 Oct 14;17(10):1316-25. doi: 10.1016/j.str.2009.08.012.

Assessing side-chain perturbations of the protein backbone: a knowledge-based classification of residue Ramachandran space.

J Mol Biol. 2008 May 2;378(3):749-58. doi: 10.1016/j.jmb.2008.02.043. Epub 2008 Feb 29.

Differentiable, multi-dimensional, knowledge-based energy terms for torsion angle probabilities and propensities.

Proteins. 2008 Jul;72(1):62-73. doi: 10.1002/prot.21896.

Protein-protein docking with backbone flexibility.

J Mol Biol. 2007 Oct 19;373(2):503-19. doi: 10.1016/j.jmb.2007.07.050. Epub 2007 Aug 2.

Loop modeling: Sampling, filtering, and scoring.

Proteins. 2008 Feb 15;70(3):834-43. doi: 10.1002/prot.21612.

Statistical and conformational analysis of the electron density of protein side chains.

Proteins. 2007 Feb 1;66(2):279-303. doi: 10.1002/prot.21150.

Computational basis of knowledge-based conformational probabilities derived from local- and long-range interactions in proteins.

Proteins. 2007 Jan 1;66(1):29-40. doi: 10.1002/prot.21206.

Bayesian statistical studies of the Ramachandran distribution.

Stat Appl Genet Mol Biol. 2005;4:Article35. doi: 10.2202/1544-6115.1165. Epub 2005 Nov 22.

Importance of the CMAP correction to the CHARMM22 protein force field: dynamics of hen lysozyme.

Biophys J. 2006 Feb 15;90(4):L36-8. doi: 10.1529/biophysj.105.078154. Epub 2005 Dec 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于层次狄利克雷过程模型开发的依赖于邻居的氨基酸拉马钱德兰概率分布。

Neighbor-dependent Ramachandran probability distributions of amino acids developed from a hierarchical Dirichlet process model.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献