基于状态空间建模的结构分析。

Structural analysis based on state-space modeling.

作者信息

Stultz C M, White J V, Smith T F

机构信息

Committee on Higher Degrees on Biophysics, Harvard University, Cambridge, Massachusetts 02138.

出版信息

Protein Sci. 1993 Mar;2(3):305-14. doi: 10.1002/pro.5560020302.

DOI:10.1002/pro.5560020302

PMID:8453370

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2142382/

Abstract

A new method has been developed to compute the probability that each amino acid in a protein sequence is in a particular secondary structural element. Each of these probabilities is computed using the entire sequence and a set of predefined structural class models. This set of structural classes is patterned after Jane Richardson's taxonomy for the domains of globular proteins. For each structural class considered, a mathematical model is constructed to represent constraints on the pattern of secondary structural elements characteristic of that class. These are stochastic models having discrete state spaces (referred to as hidden Markov models by researchers in signal processing and automatic speech recognition). Each model is a mathematical generator of amino acid sequences; the sequence under consideration is modeled as having been generated by one model in the set of candidates. The probability that each model generated the given sequence is computed using a filtering algorithm. The protein is then classified as belonging to the structural class having the most probable model. The secondary structure of the sequence is then analyzed using a "smoothing" algorithm that is optimal for that structural class model. For each residue position in the sequence, the smoother computes the probability that the residue is contained within each of the defined secondary structural elements of the model. This method has two important advantages: (1) the probability of each residue being in each of the modeled secondary structural elements is computed using the totality of the amino acid sequence, and (2) these probabilities are consistent with prior knowledge of realizable domain folds as encoded in each model. As an example of the method's utility, we present its application to flavodoxin, a prototypical alpha/beta protein having a central beta-sheet, and to thioredoxin, which belongs to a similar structural class but shares no significant sequence similarity.

摘要

已开发出一种新方法来计算蛋白质序列中每个氨基酸处于特定二级结构元件中的概率。这些概率中的每一个都是使用整个序列和一组预定义的结构类模型来计算的。这组结构类是仿照简·理查森（Jane Richardson）对球状蛋白质结构域的分类法构建的。对于所考虑的每个结构类，构建一个数学模型来表示对该类特征性二级结构元件模式的约束。这些是具有离散状态空间的随机模型（信号处理和自动语音识别领域的研究人员称之为隐马尔可夫模型）。每个模型都是氨基酸序列的数学生成器；所考虑的序列被建模为是由候选集中的一个模型生成的。使用滤波算法计算每个模型生成给定序列的概率。然后将蛋白质分类为属于具有最可能模型的结构类。接着使用对该结构类模型最优的“平滑”算法来分析序列的二级结构。对于序列中的每个残基位置，平滑器计算该残基包含在模型中每个定义的二级结构元件内的概率。该方法有两个重要优点：（1）使用氨基酸序列的整体来计算每个残基处于每个建模二级结构元件中的概率，（2）这些概率与每个模型中编码的可实现结构域折叠的先验知识一致。作为该方法实用性的一个例子，我们展示了它在黄素氧还蛋白（一种具有中央β折叠片的典型α/β蛋白）和硫氧还蛋白（它属于类似的结构类，但没有显著的序列相似性）上的应用。

相似文献

Structural analysis based on state-space modeling.基于状态空间建模的结构分析。

Protein Sci. 1993 Mar;2(3):305-14. doi: 10.1002/pro.5560020302.

Protein structure prediction: recognition of primary, secondary, and tertiary structural features from amino acid sequence.蛋白质结构预测：从氨基酸序列识别一级、二级和三级结构特征。

Crit Rev Biochem Mol Biol. 1995;30(1):1-94. doi: 10.3109/10409239509085139.

Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。

Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

Protein topology recognition from secondary structure sequences: application of the hidden Markov models to the alpha class proteins.从二级结构序列识别蛋白质拓扑结构：隐马尔可夫模型在α类蛋白质中的应用。

J Mol Biol. 1997 Mar 28;267(2):446-63. doi: 10.1006/jmbi.1996.0874.

Protein structure comparison using the markov transition model of evolution.使用马尔可夫进化转移模型进行蛋白质结构比较。

Proteins. 2000 Oct 1;41(1):108-22.

An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.一种蛋白质序列与结构分析及建模的综合方法。III. 使用多重结构比对对蛋白质结构家族中的序列保守性进行比较研究。

J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975.

A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence.一种用于蛋白质折叠识别的3D-1D替换矩阵，其包含序列的预测二级结构。

J Mol Biol. 1997 Apr 11;267(4):1026-38. doi: 10.1006/jmbi.1997.0924.

Prediction of protein structural classes.蛋白质结构类别的预测。

Crit Rev Biochem Mol Biol. 1995;30(4):275-349. doi: 10.3109/10409239509083488.

Sequence-based protein structure prediction using a reduced state-space hidden Markov model.使用简化状态空间隐马尔可夫模型进行基于序列的蛋白质结构预测。

Comput Biol Med. 2007 Sep;37(9):1211-24. doi: 10.1016/j.compbiomed.2006.10.014. Epub 2006 Dec 11.

Assessment of the probabilities for evolutionary structural changes in protein folds.蛋白质折叠中进化结构变化概率的评估。

Bioinformatics. 2007 Apr 1;23(7):832-41. doi: 10.1093/bioinformatics/btm022. Epub 2007 Feb 4.

引用本文的文献

Optogenetically engineered Septin-7 enhances immune cell infiltration of tumor spheroids.光遗传学工程化 Septin-7 增强肿瘤球体中的免疫细胞浸润。

Proc Natl Acad Sci U S A. 2024 Oct 29;121(44):e2405717121. doi: 10.1073/pnas.2405717121. Epub 2024 Oct 23.

A noncommutative combinatorial protein logic circuit controls cell orientation in nanoenvironments.一种非交换组合蛋白逻辑电路控制纳米环境中的细胞定向。

Sci Adv. 2023 May 26;9(21):eadg1062. doi: 10.1126/sciadv.adg1062.

Structural and dynamic insights into α-synuclein dimer conformations.α-突触核蛋白二聚体构象的结构与动态研究

Structure. 2023 Apr 6;31(4):411-423.e6. doi: 10.1016/j.str.2023.01.011. Epub 2023 Feb 20.

The structural heterogeneity of α-synuclein is governed by several distinct subpopulations with interconversion times slower than milliseconds.α-突触核蛋白的结构异质性受几个不同亚群控制，这些亚群之间的转化时间慢于毫秒。

Structure. 2021 Sep 2;29(9):1048-1064.e6. doi: 10.1016/j.str.2021.05.002. Epub 2021 May 19.

Identification and sequencing of Date-SRY Gene: A novel tool for sex determination of date palm ( L.).枣椰树SRY基因的鉴定与测序：一种用于枣椰树（L.）性别鉴定的新工具。

Saudi J Biol Sci. 2019 Mar;26(3):514-523. doi: 10.1016/j.sjbs.2017.08.002. Epub 2017 Aug 16.

Structural determinants at the interface of the ARC2 and leucine-rich repeat domains control the activation of the plant immune receptors Rx1 and Gpa2.结构决定因素在 ARC2 和富含亮氨酸重复结构域的界面控制植物免疫受体 Rx1 和 Gpa2 的激活。

Plant Physiol. 2013 Jul;162(3):1510-28. doi: 10.1104/pp.113.218842. Epub 2013 May 9.

Explaining the structural plasticity of α-synuclein.解析α-突触核蛋白的结构可塑性。

J Am Chem Soc. 2011 Dec 7;133(48):19536-46. doi: 10.1021/ja208657z. Epub 2011 Nov 14.

A WD40-repeat gene from Malus x domestica is a functional homologue of Arabidopsis thaliana TRANSPARENT TESTA GLABRA1.从苹果属植物中分离的 WD40 重复基因是拟南芥 TRANSPARENT TESTA GLABRA1 的功能同源基因。

Plant Cell Rep. 2010 Mar;29(3):285-94. doi: 10.1007/s00299-010-0821-0. Epub 2010 Jan 28.

The PN2-3 domain of centrosomal P4.1-associated protein implements a novel mechanism for tubulin sequestration.中心体P4.1相关蛋白的PN2-3结构域实现了一种微管蛋白隔离的新机制。

J Biol Chem. 2009 Mar 13;284(11):6909-17. doi: 10.1074/jbc.M808249200. Epub 2009 Jan 7.

Regulation of ghrelin structure and membrane binding by phosphorylation.磷酸化对胃饥饿素结构和膜结合的调节作用。

Peptides. 2008 Jun;29(6):904-11. doi: 10.1016/j.peptides.2008.02.001. Epub 2008 Feb 13.

本文引用的文献

The anatomy and taxonomy of protein structure.蛋白质结构的解剖学与分类学。

Adv Protein Chem. 1981;34:167-339. doi: 10.1016/s0065-3233(08)60520-3.

Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.蛋白质二级结构词典：氢键和几何特征的模式识别

Biopolymers. 1983 Dec;22(12):2577-637. doi: 10.1002/bip.360221211.

Classification of proteins into groups based on amino acid composition and other characters. I. Angular distribution.根据氨基酸组成和其他特征对蛋白质进行分组。I. 角分布

J Biochem. 1983 Sep;94(3):981-95. doi: 10.1093/oxfordjournals.jbchem.a134442.

Hydrophobic bonding and accessible surface area in proteins.蛋白质中的疏水键合与可及表面积

Nature. 1974 Mar 22;248(446):338-9. doi: 10.1038/248338a0.

Prediction of protein conformation.蛋白质构象预测

Biochemistry. 1974 Jan 15;13(2):222-45. doi: 10.1021/bi00699a002.

Solvation energy in protein folding and binding.蛋白质折叠与结合中的溶剂化能。

Nature. 1986;319(6050):199-203. doi: 10.1038/319199a0.

Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes.蛋白质的三级模板。在不同结构类别的允许序列枚举中使用堆积标准。

J Mol Biol. 1987 Feb 20;193(4):775-91. doi: 10.1016/0022-2836(87)90358-5.

Automatic generation of primary sequence patterns from sets of related protein sequences.从相关蛋白质序列集中自动生成一级序列模式。

Proc Natl Acad Sci U S A. 1990 Jan;87(1):118-22. doi: 10.1073/pnas.87.1.118.

Crystal structure of thioredoxin from Escherichia coli at 1.68 A resolution.大肠杆菌硫氧还蛋白在1.68埃分辨率下的晶体结构。

J Mol Biol. 1990 Mar 5;212(1):167-84. doi: 10.1016/0022-2836(90)90313-B.

A method to identify protein sequences that fold into a known three-dimensional structure.一种识别能折叠成已知三维结构的蛋白质序列的方法。

Science. 1991 Jul 12;253(5016):164-70. doi: 10.1126/science.1853201.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验