一种从多重比对序列预测蛋白质二级结构的简单快速方法，准确率高于70%。

A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%.

作者信息

Mehta P K, Heringa J, Argos P

机构信息

European Molecular Biology Laboratory, Heidelberg, Germany.

出版信息

Protein Sci. 1995 Dec;4(12):2517-25. doi: 10.1002/pro.5560041208.

DOI:10.1002/pro.5560041208

PMID:8580842

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2143048/

Abstract

To improve secondary structure predictions in protein sequences, the information residing in multiple sequence alignments of substituted but structurally related proteins is exploited. A database comprised of 70 protein families and a total of 2,500 sequences, some of which were aligned by tertiary structural superpositions, was used to calculate residue exchange weight matrices within alpha-helical, beta-strand, and coil substructures, respectively. Secondary structure predictions were made based on the observed residue substitutions in local regions of the multiple alignments and the largest possible associated exchange weights in each of the three matrix types. Comparison of the observed and predicted secondary structure on a per-residue basis yielded a mean accuracy of 72.2%. Individual alpha-helix, beta-strand, and coil states were respectively predicted at 66.7, and 75.8% correctness, representing a well-balanced three-state prediction. The accuracy level, verified by cross-validation through jack-knife tests on all protein families, dropped, on average, to only 70.9%, indicating the rigor of the prediction procedure. On the basis of robustness, conceptual clarity, accuracy, and executable efficiency, the method has considerable advantage, especially with its sole reliance on amino acid substitutions within structurally related proteins.

摘要

为了改进蛋白质序列中的二级结构预测，人们利用了存在于结构相关但经过替换的蛋白质的多序列比对中的信息。一个由70个蛋白质家族和总共2500个序列组成的数据库被用于分别计算α螺旋、β链和卷曲子结构内的残基交换权重矩阵，其中一些序列是通过三级结构叠加进行比对的。二级结构预测是基于多序列比对局部区域中观察到的残基替换以及三种矩阵类型中每种类型可能的最大相关交换权重进行的。在逐个残基的基础上对观察到的和预测的二级结构进行比较，得到的平均准确率为72.2%。α螺旋、β链和卷曲的单个状态分别以66.7%和75.8%的正确率被预测，代表了一个平衡良好的三状态预测。通过对所有蛋白质家族进行留一法交叉验证所验证的准确率水平平均仅降至70.9%，这表明了预测过程的严格性。基于稳健性、概念清晰度、准确性和可执行效率，该方法具有相当大的优势，特别是它仅依赖于结构相关蛋白质内的氨基酸替换。

相似文献

A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%.一种从多重比对序列预测蛋白质二级结构的简单快速方法，准确率高于70%。

Protein Sci. 1995 Dec;4(12):2517-25. doi: 10.1002/pro.5560041208.

Protein secondary structure prediction using local alignments.利用局部比对进行蛋白质二级结构预测。

J Mol Biol. 1997 Apr 25;268(1):31-6. doi: 10.1006/jmbi.1997.0958.

Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. II. Secondary structures.在从同源蛋白质的比对序列进行结构预测中使用氨基酸环境依赖性替换表和构象倾向。II. 二级结构。

J Mol Biol. 1994 May 20;238(5):693-708. doi: 10.1006/jmbi.1994.1330.

Improving protein secondary structure prediction with aligned homologous sequences.利用比对的同源序列改进蛋白质二级结构预测

Protein Sci. 1996 Jan;5(1):106-13. doi: 10.1002/pro.5560050113.

An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.一种蛋白质序列与结构分析及建模的综合方法。III. 使用多重结构比对对蛋白质结构家族中的序列保守性进行比较研究。

J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975.

The limits of protein secondary structure prediction accuracy from multiple sequence alignment.基于多序列比对的蛋白质二级结构预测准确性的局限性。

J Mol Biol. 1993 Dec 20;234(4):951-7. doi: 10.1006/jmbi.1993.1649.

A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence.一种用于蛋白质折叠识别的3D-1D替换矩阵，其包含序列的预测二级结构。

J Mol Biol. 1997 Apr 11;267(4):1026-38. doi: 10.1006/jmbi.1997.0924.

The influence of gapped positions in multiple sequence alignments on secondary structure prediction methods.多序列比对中的空位对二级结构预测方法的影响。

Comput Biol Chem. 2004 Dec;28(5-6):351-66. doi: 10.1016/j.compbiolchem.2004.09.005.

A neural network method for prediction of beta-turn types in proteins using evolutionary information.一种利用进化信息预测蛋白质中β-转角类型的神经网络方法。

Bioinformatics. 2004 Nov 1;20(16):2751-8. doi: 10.1093/bioinformatics/bth322. Epub 2004 May 14.

Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence.结合GOR V算法与进化信息从氨基酸序列预测蛋白质二级结构。

Proteins. 2002 Nov 1;49(2):154-66. doi: 10.1002/prot.10181.

引用本文的文献

Accurate informatic modeling of tooth enamel pellicle interactions by training substitution matrices with Mat4Pep.通过使用Mat4Pep训练替换矩阵对牙釉质薄膜相互作用进行精确的信息学建模。

Front Mater. 2024;11. doi: 10.3389/fmats.2024.1436379. Epub 2024 Dec 19.

Characterization on the oncogenic effect of the missense mutations of p53 via machine learning.基于机器学习的 p53 错义突变致癌效应的特征分析。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad428.

Propensities of Some Amino Acid Pairings in α-Helices Vary with Length.某些氨基酸对在α-螺旋中的倾向性随长度而变化。

Protein J. 2022 Dec;41(6):551-562. doi: 10.1007/s10930-022-10076-3. Epub 2022 Sep 28.

Modeling aspects of the language of life through transfer-learning protein sequences.通过转移学习蛋白质序列来模拟生命语言的各个方面。

BMC Bioinformatics. 2019 Dec 17;20(1):723. doi: 10.1186/s12859-019-3220-8.

Collective dynamics differentiates functional divergence in protein evolution.集体动力学区分了蛋白质进化中的功能分歧。

PLoS Comput Biol. 2012;8(3):e1002428. doi: 10.1371/journal.pcbi.1002428. Epub 2012 Mar 29.

Predicting DNA-binding specificities of eukaryotic transcription factors.预测真核转录因子的 DNA 结合特异性。

PLoS One. 2010 Nov 30;5(11):e13876. doi: 10.1371/journal.pone.0013876.

Probing protein fold space with a simplified model.用简化模型探索蛋白质折叠空间

J Mol Biol. 2008 Jan 25;375(4):920-33. doi: 10.1016/j.jmb.2007.10.087. Epub 2007 Nov 9.

Generation of deviation parameters for amino acid singlets, doublets and triplets from three-dimentional structures of proteins and its implications for secondary structure prediction from amino acid sequences.从蛋白质三维结构生成氨基酸单联体、双联体和三联体的偏差参数及其对从氨基酸序列预测二级结构的意义。

J Biosci. 2000 Mar;25(1):81-91. doi: 10.1007/BF02985185.

Electron donation to the flavoprotein NifL, a redox-sensing transcriptional regulator.向黄素蛋白NifL（一种氧化还原感应转录调节因子）提供电子。

Biochem J. 1998 Jun 1;332 ( Pt 2)(Pt 2):413-9. doi: 10.1042/bj3320413.

A proposed architecture for lecithin cholesterol acyl transferase (LCAT): identification of the catalytic triad and molecular modeling.一种拟议的卵磷脂胆固醇酰基转移酶（LCAT）架构：催化三联体的鉴定与分子建模

Protein Sci. 1998 Mar;7(3):587-99. doi: 10.1002/pro.5560070307.

本文引用的文献

Prediction of protein secondary structure at better than 70% accuracy.蛋白质二级结构预测准确率高于70%。

J Mol Biol. 1993 Jul 20;232(2):584-99. doi: 10.1006/jmbi.1993.1413.

Comparison of three algorithms for the assignment of secondary structure in proteins: the advantages of a consensus assignment.三种蛋白质二级结构分配算法的比较：一致性分配的优势

Protein Eng. 1993 Jun;6(4):377-82. doi: 10.1093/protein/6.4.377.

Quantification of secondary structure prediction improvement using multiple alignments.使用多序列比对对二级结构预测改进进行量化。

Protein Eng. 1993 Nov;6(8):849-54. doi: 10.1093/protein/6.8.849.

The limits of protein secondary structure prediction accuracy from multiple sequence alignment.基于多序列比对的蛋白质二级结构预测准确性的局限性。

J Mol Biol. 1993 Dec 20;234(4):951-7. doi: 10.1006/jmbi.1993.1649.

The prediction and orientation of alpha-helices from sequence alignments: the combined use of environment-dependent substitution tables, Fourier transform methods and helix capping rules.

Protein Eng. 1994 May;7(5):645-53. doi: 10.1093/protein/7.5.645.

Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments.结合最近邻算法和多序列比对预测蛋白质二级结构。

J Mol Biol. 1995 Mar 17;247(1):11-5. doi: 10.1006/jmbi.1994.0116.

A comprehensive set of sequence analysis programs for the VAX.一套适用于VAX的综合序列分析程序。

Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):387-95. doi: 10.1093/nar/12.1part1.387.

Structural principles of the globular organization of protein chains. A stereochemical theory of globular protein secondary structure.蛋白质链球状结构的结构原理。球状蛋白质二级结构的立体化学理论。

J Mol Biol. 1974 Oct 5;88(4):857-72. doi: 10.1016/0022-2836(74)90404-5.

Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins.根据蛋白质计算得出的螺旋、β折叠和无规卷曲区域中氨基酸的构象参数。

Biochemistry. 1974 Jan 15;13(2):211-22. doi: 10.1021/bi00699a001.

Prediction of protein secondary structure and active sites using the alignment of homologous sequences.利用同源序列比对预测蛋白质二级结构和活性位点。

J Mol Biol. 1987 Jun 20;195(4):957-61. doi: 10.1016/0022-2836(87)90501-8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验