Suppr超能文献

基于层次狄利克雷过程模型开发的依赖于邻居的氨基酸拉马钱德兰概率分布。

Neighbor-dependent Ramachandran probability distributions of amino acids developed from a hierarchical Dirichlet process model.

机构信息

Department of Statistics, University of California Berkeley, Berkeley, California, United States of America.

出版信息

PLoS Comput Biol. 2010 Apr 29;6(4):e1000763. doi: 10.1371/journal.pcbi.1000763.

Abstract

Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp.

摘要

蛋白质骨架二面角的分布已经被研究了超过 40 年。虽然已经提出了许多统计分析方法,但只有少数几种概率密度可供用于结构验证和结构预测方法。可用的分布在许多重要方面存在差异,这决定了它们在各种用途中的有用性。这些差异包括:1)输入数据的大小和结构包含的标准(分辨率、R 因子等);2)使用 B 因子或其他特征过滤可疑构象和离群值;3)输入数据的二级结构(例如,是否包含螺旋和片层;是否包含β转角);4)用于确定概率密度的方法,范围从简单的直方图到现代的非参数密度估计;5)它们是否包括在 Ramachandran 图谱的不同区域中构象分布的最近邻效应。在这项工作中,根据计算出的电子密度,从高分辨率数据集过滤后,为蛋白质环中的残基提供了 Ramachandran 概率分布。已经确定了所有 20 种氨基酸(顺式和反式脯氨酸分别处理)的分布,以及 420 个左邻和 420 个右邻依赖的分布。使用基于 Dirichlet 过程的贝叶斯非参数统计分析准确地估计了邻居独立和邻居依赖的概率密度。特别是,我们使用了层次 Dirichlet 过程先验,允许在特定残基类型和不同邻位残基类型的密度之间共享信息。所得分布在 Rosetta 程序的环建模基准测试中进行了测试,并证明可显著改善蛋白质环构象预测。这些分布可在 http://dunbrack.fccc.edu/hdp 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验