基于残基平均核磁共振化学位移，应用数据挖掘工具对蛋白质结构类别进行分类。

Application of data mining tools for classification of protein structural class from residue based averaged NMR chemical shifts.

作者信息

Kumar Arun V, Ali Rehana F M, Cao Yu, Krishnan V V

机构信息

Department of Computer Science, California State University, Fresno, CA 93740, United States.

Department of Chemistry, California State University, Fresno, CA 93740, United States; Department of Pathology and Laboratory Medicine, School of Medicine, University of California, Davis, CA 95616, United States.

出版信息

Biochim Biophys Acta. 2015 Oct;1854(10 Pt A):1545-52. doi: 10.1016/j.bbapap.2015.02.016. Epub 2015 Mar 7.

DOI:10.1016/j.bbapap.2015.02.016

PMID:25758094

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4547871/

Abstract

The number of protein sequences deriving from genome sequencing projects is outpacing our knowledge about the function of these proteins. With the gap between experimentally characterized and uncharacterized proteins continuing to widen, it is necessary to develop new computational methods and tools for protein structural information that is directly related to function. Nuclear magnetic resonance (NMR) provides powerful means to determine three-dimensional structures of proteins in the solution state. However, translation of the NMR spectral parameters to even low-resolution structural information such as protein class requires multiple time consuming steps. In this paper, we present an unorthodox method to predict the protein structural class directly by using the residue's averaged chemical shifts (ACS) based on machine learning algorithms. Experimental chemical shift information from 1491 proteins obtained from Biological Magnetic Resonance Bank (BMRB) and their respective protein structural classes derived from structural classification of proteins (SCOP) were used to construct a data set with 119 attributes and 5 different classes. Twenty four different classification schemes were evaluated using several performance measures. Overall the residue based ACS values can predict the protein structural classes with 80% accuracy measured by Matthew correlation coefficient. Specifically protein classes defined by mixed αβ or small proteins are classified with >90% correlation. Our results indicate that this NMR-based method can be utilized as a low-resolution tool for protein structural class identification without any prior chemical shift assignments.

摘要

来自基因组测序项目的蛋白质序列数量增长速度超过了我们对这些蛋白质功能的了解。随着已通过实验表征和未表征蛋白质之间的差距不断扩大，有必要开发新的计算方法和工具来获取与功能直接相关的蛋白质结构信息。核磁共振（NMR）为确定溶液状态下蛋白质的三维结构提供了有力手段。然而，将NMR光谱参数转化为即使是低分辨率的结构信息（如蛋白质类别）也需要多个耗时的步骤。在本文中，我们提出了一种非传统方法，即基于机器学习算法，直接利用残基的平均化学位移（ACS）来预测蛋白质结构类别。我们使用从生物磁共振数据库（BMRB）获得的1491种蛋白质的实验化学位移信息以及它们各自从蛋白质结构分类（SCOP）中得出的蛋白质结构类别，构建了一个具有119个属性和5个不同类别的数据集。使用几种性能指标评估了24种不同的分类方案。总体而言，如果通过马修相关系数来衡量，基于残基的ACS值能够以80%的准确率预测蛋白质结构类别。具体来说，由混合αβ结构或小蛋白质定义的蛋白质类别分类的相关性>90%。我们的结果表明，这种基于NMR的方法可作为一种低分辨率工具，用于在没有任何先验化学位移分配的情况下识别蛋白质结构类别。

相似文献

Application of data mining tools for classification of protein structural class from residue based averaged NMR chemical shifts.基于残基平均核磁共振化学位移，应用数据挖掘工具对蛋白质结构类别进行分类。

Biochim Biophys Acta. 2015 Oct;1854(10 Pt A):1545-52. doi: 10.1016/j.bbapap.2015.02.016. Epub 2015 Mar 7.

The prediction of protein structural class using averaged chemical shifts.利用平均化学位移预测蛋白质结构类别。

J Biomol Struct Dyn. 2012;29(6):643-9. doi: 10.1080/07391102.2011.672628.

Predicting the redox state and secondary structure of cysteine residues using multi-dimensional classification analysis of NMR chemical shifts.利用核磁共振化学位移的多维分类分析预测半胱氨酸残基的氧化还原状态和二级结构。

J Biomol NMR. 2016 Sep;66(1):55-68. doi: 10.1007/s10858-016-0057-6. Epub 2016 Sep 9.

Protein structural class identification directly from NMR spectra using averaged chemical shifts.使用平均化学位移直接从核磁共振光谱中识别蛋白质结构类别。

Bioinformatics. 2003 Nov 1;19(16):2054-64. doi: 10.1093/bioinformatics/btg280.

An empirical correlation between secondary structure content and averaged chemical shifts in proteins.蛋白质二级结构含量与平均化学位移之间的经验相关性。

Biophys J. 2003 Feb;84(2 Pt 1):1223-7. doi: 10.1016/S0006-3495(03)74937-6.

Prediction of hydrogen and carbon chemical shifts from RNA using database mining and support vector regression.利用数据库挖掘和支持向量回归预测RNA中氢和碳的化学位移

J Biomol NMR. 2015 Sep;63(1):39-52. doi: 10.1007/s10858-015-9961-4. Epub 2015 Jul 4.

Conformationally selective multidimensional chemical shift ranges in proteins from a PACSY database purged using intrinsic quality criteria.使用内在质量标准清除的来自PACSY数据库的蛋白质中构象选择性多维化学位移范围。

J Biomol NMR. 2016 Feb;64(2):115-30. doi: 10.1007/s10858-016-0013-5. Epub 2016 Jan 19.

BioMagResBank (BMRB) as a Resource for Structural Biology.生物磁共振数据库（BMRB）作为结构生物学资源。

Methods Mol Biol. 2020;2112:187-218. doi: 10.1007/978-1-0716-0270-6_14.

Accessible surface area from NMR chemical shifts.基于核磁共振化学位移的可及表面积

J Biomol NMR. 2015 Jul;62(3):387-401. doi: 10.1007/s10858-015-9957-0. Epub 2015 Jun 16.

Protein energetic conformational analysis from NMR chemical shifts (PECAN) and its use in determining secondary structural elements.基于核磁共振化学位移的蛋白质能量构象分析（PECAN）及其在确定二级结构元件中的应用。

J Biomol NMR. 2005 May;32(1):71-81. doi: 10.1007/s10858-005-5705-1.

引用本文的文献

Comparative Study on Feature Selection in Protein Structure and Function Prediction.蛋白质结构与功能预测中的特征选择比较研究。

Comput Math Methods Med. 2022 Oct 11;2022:1650693. doi: 10.1155/2022/1650693. eCollection 2022.

Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences.使用递归特征选择和随机森林提高低相似度序列的蛋白质结构分类预测。

Comput Math Methods Med. 2021 May 7;2021:5529389. doi: 10.1155/2021/5529389. eCollection 2021.

Prediction of protein structural classes by different feature expressions based on 2-D wavelet denoising and fusion.基于二维小波去噪和融合的不同特征表达预测蛋白质结构类别。

BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):701. doi: 10.1186/s12859-019-3276-5.

本文引用的文献

CSI 2.0: a significantly improved version of the Chemical Shift Index.CSI 2.0：化学位移指数的显著改进版本。

J Biomol NMR. 2014 Nov;60(2-3):131-46. doi: 10.1007/s10858-014-9863-x. Epub 2014 Oct 2.

Incorporating secondary structural features into sequence information for predicting protein structural class.将二级结构特征纳入序列信息以预测蛋白质结构类别。

Protein Pept Lett. 2013 Oct;20(10):1079-87. doi: 10.2174/09298665113209990002.

Interpreting protein chemical shift data.解读蛋白质化学位移数据。

Prog Nucl Magn Reson Spectrosc. 2011 Feb;58(1-2):62-87. doi: 10.1016/j.pnmrs.2010.07.004. Epub 2010 Aug 5.

SPARTA+: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network.SPARTA+：通过人工神经网络对经验核磁共振化学位移预测的适度改进。

J Biomol NMR. 2010 Sep;48(1):13-22. doi: 10.1007/s10858-010-9433-9. Epub 2010 Jul 14.

Characterization of protein secondary structure from NMR chemical shifts.通过核磁共振化学位移表征蛋白质二级结构

Prog Nucl Magn Reson Spectrosc. 2009 Apr 5;54(3-4):141-165. doi: 10.1016/j.pnmrs.2008.06.002.

Prediction of Xaa-Pro peptide bond conformation from sequence and chemical shifts.从序列和化学位移预测 Xaa-Pro 肽键构象。

J Biomol NMR. 2010 Mar;46(3):199-204. doi: 10.1007/s10858-009-9395-y. Epub 2009 Dec 30.

DANGLE: A Bayesian inferential method for predicting protein backbone dihedral angles and secondary structure.DANGLE：一种用于预测蛋白质主链二面角和二级结构的贝叶斯推断方法。

J Magn Reson. 2010 Feb;202(2):223-33. doi: 10.1016/j.jmr.2009.11.008. Epub 2009 Dec 16.

Fast and accurate predictions of protein NMR chemical shifts from interatomic distances.基于原子间距离对蛋白质核磁共振化学位移进行快速准确的预测。

J Am Chem Soc. 2009 Oct 7;131(39):13894-5. doi: 10.1021/ja903772t.

TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts.TALOS+：一种利用核磁共振化学位移预测蛋白质主链扭转角的混合方法。

J Biomol NMR. 2009 Aug;44(4):213-23. doi: 10.1007/s10858-009-9333-z. Epub 2009 Jun 23.

CSSI-PRO: a method for secondary structure type editing, assignment and estimation in proteins using linear combination of backbone chemical shifts.CSSI-PRO：一种利用主链化学位移线性组合对蛋白质二级结构类型进行编辑、指派和评估的方法。

J Biomol NMR. 2009 Aug;44(4):185-94. doi: 10.1007/s10858-009-9327-x. Epub 2009 Jun 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验