使用k间隔氨基酸对从蛋白质序列预测柔性/刚性区域。

Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs.

作者信息

Chen Ke, Kurgan Lukasz A, Ruan Jishou

机构信息

Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada.

出版信息

BMC Struct Biol. 2007 Apr 16;7:25. doi: 10.1186/1472-6807-7-25.

DOI:10.1186/1472-6807-7-25

PMID:17437643

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1863424/

Abstract

BACKGROUND

Traditionally, it is believed that the native structure of a protein corresponds to a global minimum of its free energy. However, with the growing number of known tertiary (3D) protein structures, researchers have discovered that some proteins can alter their structures in response to a change in their surroundings or with the help of other proteins or ligands. Such structural shifts play a crucial role with respect to the protein function. To this end, we propose a machine learning method for the prediction of the flexible/rigid regions of proteins (referred to as FlexRP); the method is based on a novel sequence representation and feature selection. Knowledge of the flexible/rigid regions may provide insights into the protein folding process and the 3D structure prediction.

RESULTS

The flexible/rigid regions were defined based on a dataset, which includes protein sequences that have multiple experimental structures, and which was previously used to study the structural conservation of proteins. Sequences drawn from this dataset were represented based on feature sets that were proposed in prior research, such as PSI-BLAST profiles, composition vector and binary sequence encoding, and a newly proposed representation based on frequencies of k-spaced amino acid pairs. These representations were processed by feature selection to reduce the dimensionality. Several machine learning methods for the prediction of flexible/rigid regions and two recently proposed methods for the prediction of conformational changes and unstructured regions were compared with the proposed method. The FlexRP method, which applies Logistic Regression and collocation-based representation with 95 features, obtained 79.5% accuracy. The two runner-up methods, which apply the same sequence representation and Support Vector Machines (SVM) and Naïve Bayes classifiers, obtained 79.2% and 78.4% accuracy, respectively. The remaining considered methods are characterized by accuracies below 70%. Finally, the Naïve Bayes method is shown to provide the highest sensitivity for the prediction of flexible regions, while FlexRP and SVM give the highest sensitivity for rigid regions.

CONCLUSION

A new sequence representation that uses k-spaced amino acid pairs is shown to be the most efficient in the prediction of the flexible/rigid regions of protein sequences. The proposed FlexRP method provides the highest prediction accuracy of about 80%. The experimental tests show that the FlexRP and SVM methods achieved high overall accuracy and the highest sensitivity for rigid regions, while the best quality of the predictions for flexible regions is achieved by the Naïve Bayes method.

摘要

背景

传统观点认为，蛋白质的天然结构对应于其自由能的全局最小值。然而，随着已知三级（3D）蛋白质结构数量的不断增加，研究人员发现一些蛋白质能够根据周围环境的变化，或在其他蛋白质或配体的帮助下改变其结构。这种结构转变对蛋白质功能起着至关重要的作用。为此，我们提出了一种用于预测蛋白质柔性/刚性区域的机器学习方法（称为FlexRP）；该方法基于一种新颖的序列表示和特征选择。了解柔性/刚性区域可能有助于深入了解蛋白质折叠过程和3D结构预测。

结果

基于一个数据集定义了柔性/刚性区域，该数据集包含具有多个实验结构的蛋白质序列，并且先前用于研究蛋白质的结构保守性。从该数据集中提取的序列基于先前研究中提出的特征集进行表示，如PSI-BLAST图谱、组成向量和二元序列编码，以及一种基于k间隔氨基酸对频率的新提出的表示方法。这些表示通过特征选择进行处理以降低维度。将几种用于预测柔性/刚性区域的机器学习方法以及最近提出的两种用于预测构象变化和无结构区域的方法与所提出的方法进行了比较。应用逻辑回归和基于搭配的具有95个特征的表示方法的FlexRP方法，准确率达到79.5%。另外两种亚军方法，应用相同的序列表示以及支持向量机（SVM）和朴素贝叶斯分类器，准确率分别为79.2%和78.4%。其余考虑的方法准确率均低于70%。最后，朴素贝叶斯方法在预测柔性区域时显示出最高的灵敏度，而FlexRP和SVM在预测刚性区域时具有最高的灵敏度。

结论

一种使用k间隔氨基酸对的新序列表示在预测蛋白质序列的柔性/刚性区域方面被证明是最有效的。所提出的FlexRP方法提供了约80%的最高预测准确率。实验测试表明，FlexRP和SVM方法实现了较高的总体准确率以及对刚性区域的最高灵敏度，而朴素贝叶斯方法在预测柔性区域方面具有最佳的预测质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d4b/1863424/e3e71b015dbb/1472-6807-7-25-1.jpg

相似文献

Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs.使用k间隔氨基酸对从蛋白质序列预测柔性/刚性区域。

BMC Struct Biol. 2007 Apr 16;7:25. doi: 10.1186/1472-6807-7-25.

Prediction of protein structural class using novel evolutionary collocation-based sequence representation.使用基于新型进化搭配的序列表示法预测蛋白质结构类别。

J Comput Chem. 2008 Jul 30;29(10):1596-604. doi: 10.1002/jcc.20918.

Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.基于机器学习的蛋白质-RNA 界面残基预测：现状评估。

BMC Bioinformatics. 2012 May 10;13:89. doi: 10.1186/1471-2105-13-89.

Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences.利用蛋白质序列的物理化学性质进行泛素化位点预测的计算方法。

BMC Bioinformatics. 2016 Mar 3;17:116. doi: 10.1186/s12859-016-0959-z.

Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature Fusion.基于伪氨基酸组成预测蛋白质同源寡聚体类型：采用改进的特征提取和朴素贝叶斯特征融合方法

Amino Acids. 2006 Jun;30(4):461-8. doi: 10.1007/s00726-006-0263-8. Epub 2006 May 15.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法：一种用于判别式多类别蛋白质折叠和超家族识别的工具。

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

Prediction of protein crystallization using collocation of amino acid pairs.利用氨基酸对的搭配预测蛋白质结晶

Biochem Biophys Res Commun. 2007 Apr 13;355(3):764-9. doi: 10.1016/j.bbrc.2007.02.040. Epub 2007 Feb 15.

Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features.基于多视图特征融合的蛋白质亚细胞定位预测。

Molecules. 2019 Mar 6;24(5):919. doi: 10.3390/molecules24050919.

Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes.基于随机森林和高斯朴素贝叶斯混合特征选择的DNA结合蛋白序列预测

PLoS One. 2014 Jan 24;9(1):e86703. doi: 10.1371/journal.pone.0086703. eCollection 2014.

Predicting residue-wise contact orders in proteins by support vector regression.通过支持向量回归预测蛋白质中残基水平的接触序。

BMC Bioinformatics. 2006 Oct 3;7:425. doi: 10.1186/1471-2105-7-425.

引用本文的文献

ShortStop: a machine learning framework for microprotein discovery.ShortStop：一种用于微小蛋白质发现的机器学习框架。

BMC Methods. 2025;2(1):16. doi: 10.1186/s44330-025-00037-4. Epub 2025 Aug 1.

A genetic algorithm-based ensemble model for efficiently identifying interleukin 6 inducing peptides.一种基于遗传算法的集成模型，用于高效识别白细胞介素6诱导肽。

Sci Rep. 2025 Jul 1;15(1):21213. doi: 10.1038/s41598-025-05491-2.

Graph-RPI: predicting RNA-protein interactions via graph autoencoder and self-supervised learning strategies.Graph-RPI：通过图自动编码器和自监督学习策略预测RNA-蛋白质相互作用

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf292.

iDNS3IP: Identification and Characterization of HCV NS3 Protease Inhibitory Peptides.iDNS3IP：丙型肝炎病毒NS3蛋白酶抑制肽的鉴定与表征

Int J Mol Sci. 2025 Jun 3;26(11):5356. doi: 10.3390/ijms26115356.

PredPSP: a novel computational tool to discover pathway-specific photosynthetic proteins in plants.PredPSP：一种新型计算工具，用于发现植物中途径特异性的光合蛋白。

Plant Mol Biol. 2024 Sep 24;114(5):106. doi: 10.1007/s11103-024-01500-6.

ProSol-multi: Protein solubility prediction via amino acids multi-level correlation and discriminative distribution.ProSol-multi：基于氨基酸多级相关性和判别性分布的蛋白质溶解度预测

Heliyon. 2024 Aug 22;10(17):e36041. doi: 10.1016/j.heliyon.2024.e36041. eCollection 2024 Sep 15.

PredIL13: Stacking a variety of machine and deep learning methods with ESM-2 language model for identifying IL13-inducing peptides.PredIL13：结合多种机器和深度学习方法以及 ESM-2 语言模型，用于识别诱导 IL13 的肽。

PLoS One. 2024 Aug 22;19(8):e0309078. doi: 10.1371/journal.pone.0309078. eCollection 2024.

PLMC: Language Model of Protein Sequences Enhances Protein Crystallization Prediction.PLMC：蛋白质序列的语言模型增强蛋白质结晶预测。

Interdiscip Sci. 2024 Dec;16(4):802-813. doi: 10.1007/s12539-024-00639-6. Epub 2024 Aug 19.

PMTPred: machine-learning-based prediction of protein methyltransferases using the composition of k-spaced amino acid pairs.PMTPred：基于k间隔氨基酸对组成的蛋白质甲基转移酶的机器学习预测

Mol Divers. 2024 Aug;28(4):2301-2315. doi: 10.1007/s11030-024-10937-2. Epub 2024 Jul 21.

ENCAP: Computational prediction of tumor T cell antigens with ensemble classifiers and diverse sequence features.ENCAP：使用集成分类器和多种序列特征进行肿瘤 T 细胞抗原的计算预测。

PLoS One. 2024 Jul 18;19(7):e0307176. doi: 10.1371/journal.pone.0307176. eCollection 2024.

本文引用的文献

Quantitative analysis of the conservation of the tertiary structure of protein segments.蛋白质片段三级结构保守性的定量分析。

Protein J. 2006 Jul;25(5):301-15. doi: 10.1007/s10930-006-9016-5.

Using pseudo-amino acid composition and support vector machine to predict protein structural class.利用伪氨基酸组成和支持向量机预测蛋白质结构类别。

J Theor Biol. 2006 Dec 7;243(3):444-8. doi: 10.1016/j.jtbi.2006.06.025. Epub 2006 Jul 1.

Classifier ensembles for protein structural class prediction with varying homology.用于具有不同同源性的蛋白质结构类别预测的分类器集成

Biochem Biophys Res Commun. 2006 Sep 29;348(3):981-8. doi: 10.1016/j.bbrc.2006.07.141. Epub 2006 Jul 31.

Wiggle-predicting functionally flexible regions from primary sequence.从一级序列预测摆动功能灵活区域。

PLoS Comput Biol. 2006 Jul 14;2(7):e90. doi: 10.1371/journal.pcbi.0020090. Epub 2006 Jun 5.

A model of local-minima distribution on conformational space and its application to protein structure prediction.构象空间中局部极小值分布模型及其在蛋白质结构预测中的应用。

Proteins. 2006 Sep 1;64(4):985-91. doi: 10.1002/prot.21084.

Nucleoside transporters: from scavengers to novel therapeutic targets.核苷转运体：从清除剂到新型治疗靶点

Trends Pharmacol Sci. 2006 Aug;27(8):416-25. doi: 10.1016/j.tips.2006.06.004. Epub 2006 Jul 3.

Atom-by-atom analysis of global downhill protein folding.全局下坡蛋白质折叠的逐原子分析。

Nature. 2006 Jul 20;442(7100):317-21. doi: 10.1038/nature04859. Epub 2006 Jun 14.

How flexible is alpha-actinin's rod domain?α-辅肌动蛋白的杆状结构域有多灵活？

Mech Chem Biosyst. 2004 Dec;1(4):291-302.

Sorting out Toll signals.梳理Toll信号

Cell. 2006 Jun 2;125(5):834-6. doi: 10.1016/j.cell.2006.05.014.

PepDist: a new framework for protein-peptide binding prediction based on learning peptide distance functions.PepDist：一种基于学习肽距离函数的蛋白质-肽结合预测新框架。

BMC Bioinformatics. 2006 Mar 20;7 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2105-7-S1-S3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用k间隔氨基酸对从蛋白质序列预测柔性/刚性区域。

Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献