• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于蛋白质结构预测的隐马尔可夫模型的狄利克雷过程混合模型

A DIRICHLET PROCESS MIXTURE OF HIDDEN MARKOV MODELS FOR PROTEIN STRUCTURE PREDICTION.

作者信息

Lennox Kristin P, Dahl David B, Vannucci Marina, Day Ryan, Tsai Jerry W

机构信息

Department of Statistics, Texas A&M University, 3143 TAMU, College Station, Texas 77843-3143, USA,

出版信息

Ann Appl Stat. 2010 Jun 1;4(2):916-942. doi: 10.1214/09-AOAS296.

DOI:10.1214/09-AOAS296
PMID:21031154
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2964143/
Abstract

By providing new insights into the distribution of a protein's torsion angles, recent statistical models for this data have pointed the way to more efficient methods for protein structure prediction. Most current approaches have concentrated on bivariate models at a single sequence position. There is, however, considerable value in simultaneously modeling angle pairs at multiple sequence positions in a protein. One area of application for such models is in structure prediction for the highly variable loop and turn regions. Such modeling is difficult due to the fact that the number of known protein structures available to estimate these torsion angle distributions is typically small. Furthermore, the data is "sparse" in that not all proteins have angle pairs at each sequence position. We propose a new semiparametric model for the joint distributions of angle pairs at multiple sequence positions. Our model accommodates sparse data by leveraging known information about the behavior of protein secondary structure. We demonstrate our technique by predicting the torsion angles in a loop from the globin fold family. Our results show that a template-based approach can now be successfully extended to modeling the notoriously difficult loop and turn regions.

摘要

通过提供有关蛋白质扭转角分布的新见解,最近针对该数据的统计模型为蛋白质结构预测的更有效方法指明了方向。当前大多数方法都集中在单个序列位置的双变量模型上。然而,同时对蛋白质中多个序列位置的角对进行建模具有相当大的价值。此类模型的一个应用领域是高度可变的环和转角区域的结构预测。由于用于估计这些扭转角分布的已知蛋白质结构数量通常很少,因此这种建模很困难。此外,数据是“稀疏的”,因为并非所有蛋白质在每个序列位置都有角对。我们提出了一种用于多个序列位置角对联合分布的新半参数模型。我们的模型通过利用有关蛋白质二级结构行为的已知信息来处理稀疏数据。我们通过预测球蛋白折叠家族中环的扭转角来展示我们的技术。我们的结果表明,基于模板的方法现在可以成功扩展到对 notoriously difficult 环和转角区域进行建模。 (注:“notoriously difficult”直译为“臭名昭著地困难”,结合语境意译为“极其困难” )

相似文献

1
A DIRICHLET PROCESS MIXTURE OF HIDDEN MARKOV MODELS FOR PROTEIN STRUCTURE PREDICTION.用于蛋白质结构预测的隐马尔可夫模型的狄利克雷过程混合模型
Ann Appl Stat. 2010 Jun 1;4(2):916-942. doi: 10.1214/09-AOAS296.
2
Near-native protein loop sampling using nonparametric density estimation accommodating sparcity.使用非参数密度估计和稀疏性适应进行近天然蛋白质环采样。
PLoS Comput Biol. 2011 Oct;7(10):e1002234. doi: 10.1371/journal.pcbi.1002234. Epub 2011 Oct 20.
3
Density Estimation for Protein Conformation Angles Using a Bivariate von Mises Distribution and Bayesian Nonparametrics.使用二元冯·米塞斯分布和贝叶斯非参数方法进行蛋白质构象角的密度估计
J Am Stat Assoc. 2009 Jun 1;104(486):586-596. doi: 10.1198/jasa.2009.0024.
4
Assessing protein conformational sampling methods based on bivariate lag-distributions of backbone angles.基于主链角度双变量滞后分布评估蛋白质构象抽样方法。
Brief Bioinform. 2013 Nov;14(6):724-36. doi: 10.1093/bib/bbs052. Epub 2012 Aug 27.
5
TANGLE: two-level support vector regression approach for protein backbone torsion angle prediction from primary sequences.TANGLE:一种两级支持向量回归方法,用于从蛋白质一级序列预测蛋白质主链扭转角。
PLoS One. 2012;7(2):e30361. doi: 10.1371/journal.pone.0030361. Epub 2012 Feb 2.
6
Grid-based prediction of torsion angle probabilities of protein backbone and its application to discrimination of protein intrinsic disorder regions and selection of model structures.基于网格的蛋白质骨架扭转角概率预测及其在蛋白质固有无序区域的区分和模型结构选择中的应用。
BMC Bioinformatics. 2018 Feb 1;19(1):29. doi: 10.1186/s12859-018-2031-7.
7
Personal exposure to mixtures of volatile organic compounds: modeling and further analysis of the RIOPA data.个人对挥发性有机化合物混合物的暴露:RIOPA数据的建模与进一步分析
Res Rep Health Eff Inst. 2014 Jun(181):3-63.
8
Deep learning methods for protein torsion angle prediction.用于蛋白质扭转角预测的深度学习方法。
BMC Bioinformatics. 2017 Sep 18;18(1):417. doi: 10.1186/s12859-017-1834-2.
9
Analysis of an optimal hidden Markov model for secondary structure prediction.用于二级结构预测的最优隐马尔可夫模型分析。
BMC Struct Biol. 2006 Dec 13;6:25. doi: 10.1186/1472-6807-6-25.
10
A semiparametric approach to simultaneous covariance estimation for bivariate sparse longitudinal data.一种用于双变量稀疏纵向数据的同步协方差估计的半参数方法。
Biometrics. 2014 Mar;70(1):33-43. doi: 10.1111/biom.12133. Epub 2014 Jan 8.

引用本文的文献

1
Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions.基于多种拉氏构象分布的蛋白质结构分类与环建模
Comput Struct Biotechnol J. 2017 Feb 8;15:243-254. doi: 10.1016/j.csbj.2017.01.011. eCollection 2017.
2
Understanding the general packing rearrangements required for successful template based modeling of protein structure from a CASP experiment.理解基于模板的蛋白质结构建模从 CASP 实验中成功所需的一般包装重排。
Comput Biol Chem. 2013 Feb;42:40-8. doi: 10.1016/j.compbiolchem.2012.10.008. Epub 2012 Nov 23.
3
Assessing protein conformational sampling methods based on bivariate lag-distributions of backbone angles.基于主链角度双变量滞后分布评估蛋白质构象抽样方法。
Brief Bioinform. 2013 Nov;14(6):724-36. doi: 10.1093/bib/bbs052. Epub 2012 Aug 27.
4
Near-native protein loop sampling using nonparametric density estimation accommodating sparcity.使用非参数密度估计和稀疏性适应进行近天然蛋白质环采样。
PLoS Comput Biol. 2011 Oct;7(10):e1002234. doi: 10.1371/journal.pcbi.1002234. Epub 2011 Oct 20.
5
Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure.学习稀疏模型,用于蛋白质二级结构的动态贝叶斯网络分类器。
BMC Bioinformatics. 2011 May 13;12:154. doi: 10.1186/1471-2105-12-154.

本文引用的文献

1
Density Estimation for Protein Conformation Angles Using a Bivariate von Mises Distribution and Bayesian Nonparametrics.使用二元冯·米塞斯分布和贝叶斯非参数方法进行蛋白质构象角的密度估计
J Am Stat Assoc. 2009 Jun 1;104(486):586-596. doi: 10.1198/jasa.2009.0024.
2
A generative, probabilistic model of local protein structure.一种局部蛋白质结构的生成式概率模型。
Proc Natl Acad Sci U S A. 2008 Jul 1;105(26):8932-7. doi: 10.1073/pnas.0801715105. Epub 2008 Jun 25.
3
Protein bioinformatics and mixtures of bivariate von Mises distributions for angular data.用于角度数据的蛋白质生物信息学和二元冯·米塞斯分布混合模型
Biometrics. 2007 Jun;63(2):505-12. doi: 10.1111/j.1541-0420.2006.00682.x.
4
The RCSB PDB information portal for structural genomics.用于结构基因组学的RCSB蛋白质数据库信息门户。
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D302-5. doi: 10.1093/nar/gkj120.
5
The Protein Coil Library: a structural database of nonhelix, nonstrand fragments derived from the PDB.蛋白质卷曲文库:一个源自蛋白质数据银行(PDB)的非螺旋、非链片段的结构数据库。
Proteins. 2005 Mar 1;58(4):852-4. doi: 10.1002/prot.20394.
6
Protein imperfections: separating intrinsic from extrinsic variation of torsion angles.蛋白质缺陷:区分扭转角的内在变化与外在变化。
Acta Crystallogr D Biol Crystallogr. 2005 Jan;61(Pt 1):88-98. doi: 10.1107/S0907444904027325. Epub 2004 Dec 17.
7
MUSCLE: multiple sequence alignment with high accuracy and high throughput.MUSCLE:具有高精度和高吞吐量的多序列比对。
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.
8
Loops In Proteins (LIP)--a comprehensive loop database for homology modelling.蛋白质中的环(LIP)——用于同源建模的综合环数据库。
Protein Eng. 2003 Dec;16(12):979-85. doi: 10.1093/protein/gzg119.
9
Revisiting the Ramachandran plot: hard-sphere repulsion, electrostatics, and H-bonding in the alpha-helix.重新审视拉马钱德兰图:α-螺旋中的硬球排斥、静电作用和氢键
Protein Sci. 2003 Nov;12(11):2508-22. doi: 10.1110/ps.03235203.
10
Stereochemistry of polypeptide chain configurations.多肽链构型的立体化学
J Mol Biol. 1963 Jul;7:95-9. doi: 10.1016/s0022-2836(63)80023-6.