使用基于新型进化搭配的序列表示法预测蛋白质结构类别。

Prediction of protein structural class using novel evolutionary collocation-based sequence representation.

作者信息

Chen Ke, Kurgan Lukasz A, Ruan Jishou

机构信息

Department of Electrical and Computer Engineering, ECERF, University of Alberta, Edmonton, Alberta, Canada.

出版信息

J Comput Chem. 2008 Jul 30;29(10):1596-604. doi: 10.1002/jcc.20918.

DOI:10.1002/jcc.20918

PMID:18293306

Abstract

Knowledge of structural classes is useful in understanding of folding patterns in proteins. Although existing structural class prediction methods applied virtually all state-of-the-art classifiers, many of them use a relatively simple protein sequence representation that often includes amino acid (AA) composition. To this end, we propose a novel sequence representation that incorporates evolutionary information encoded using PSI-BLAST profile-based collocation of AA pairs. We used six benchmark datasets and five representative classifiers to quantify and compare the quality of the structural class prediction with the proposed representation. The best, classifier support vector machine achieved 61-96% accuracy on the six datasets. These predictions were comprehensively compared with a wide range of recently proposed methods for prediction of structural classes. Our comprehensive comparison shows superiority of the proposed representation, which results in error rate reductions that range between 14% and 26% when compared with predictions of the best-performing, previously published classifiers on the considered datasets. The study also shows that, for the benchmark dataset that includes sequences characterized by low identity (i.e., 25%, 30%, and 40%), the prediction accuracies are 20-35% lower than for the other three datasets that include sequences with a higher degree of similarity. In conclusion, the proposed representation is shown to substantially improve the accuracy of the structural class prediction. A web server that implements the presented prediction method is freely available at http://biomine.ece.ualberta.ca/Structural_Class/SCEC.html.

摘要

了解蛋白质的结构类别有助于理解蛋白质的折叠模式。尽管现有的结构类别预测方法几乎应用了所有最先进的分类器，但其中许多方法使用的是相对简单的蛋白质序列表示，通常包括氨基酸（AA）组成。为此，我们提出了一种新颖的序列表示方法，该方法纳入了使用基于PSI-BLAST概况的氨基酸对搭配编码的进化信息。我们使用了六个基准数据集和五个代表性分类器，以量化和比较使用所提出的表示方法进行结构类别预测的质量。最佳分类器支持向量机在这六个数据集上的准确率达到了61%-96%。这些预测结果与最近提出的各种结构类别预测方法进行了全面比较。我们的全面比较显示了所提出的表示方法的优越性，与在所考虑的数据集上表现最佳的先前发表的分类器的预测结果相比，其错误率降低了14%至26%。该研究还表明，对于包含低同源性（即25%、30%和40%）序列的基准数据集，预测准确率比其他三个包含更高相似度序列的数据集低20%-35%。总之，所提出的表示方法被证明能显著提高结构类别预测的准确率。实现所提出预测方法的网络服务器可在http://biomine.ece.ualberta.ca/Structural_Class/SCEC.html上免费获取。

相似文献

Prediction of protein structural class using novel evolutionary collocation-based sequence representation.使用基于新型进化搭配的序列表示法预测蛋白质结构类别。

J Comput Chem. 2008 Jul 30;29(10):1596-604. doi: 10.1002/jcc.20918.

Prediction of protein structural class for the twilight zone sequences.对处于模糊界限区域的序列进行蛋白质结构类别的预测。

Biochem Biophys Res Commun. 2007 Jun 1;357(2):453-60. doi: 10.1016/j.bbrc.2007.03.164. Epub 2007 Apr 5.

PFRES: protein fold classification by using evolutionary information and predicted secondary structure.PFRES：利用进化信息和预测的二级结构进行蛋白质折叠分类

Bioinformatics. 2007 Nov 1;23(21):2843-50. doi: 10.1093/bioinformatics/btm475. Epub 2007 Oct 17.

Prediction of protein structural class for low-similarity sequences using support vector machine and PSI-BLAST profile.使用支持向量机和 PSI-BLAST 轮廓预测低相似度序列的蛋白质结构类别。

Biochimie. 2010 Oct;92(10):1330-4. doi: 10.1016/j.biochi.2010.06.013. Epub 2010 Jun 23.

iFC²: an integrated web-server for improved prediction of protein structural class, fold type, and secondary structure content.iFC²：一个集成的网络服务器，用于提高蛋白质结构类别、折叠类型和二级结构含量的预测。

Amino Acids. 2011 Mar;40(3):963-73. doi: 10.1007/s00726-010-0721-1. Epub 2010 Aug 21.

Classifier ensembles for protein structural class prediction with varying homology.用于具有不同同源性的蛋白质结构类别预测的分类器集成

Biochem Biophys Res Commun. 2006 Sep 29;348(3):981-8. doi: 10.1016/j.bbrc.2006.07.141. Epub 2006 Jul 31.

Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs.通过并置疏水氨基酸对预测整合膜蛋白类型

J Comput Chem. 2009 Jan 15;30(1):163-72. doi: 10.1002/jcc.21053.

High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure.基于预测的二级结构对低相似度序列进行蛋白质结构类别高精度预测。

Biochimie. 2011 Apr;93(4):710-4. doi: 10.1016/j.biochi.2011.01.001. Epub 2011 Jan 13.

Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles.使用 PSI-BLAST -profile 的自协方差变换准确预测蛋白质结构类别。

Amino Acids. 2012 Jun;42(6):2243-9. doi: 10.1007/s00726-011-0964-5. Epub 2011 Jun 23.

A high-accuracy protein structural class prediction algorithm using predicted secondary structural information.利用预测的二级结构信息进行高精度蛋白质结构类预测算法。

J Theor Biol. 2010 Dec 7;267(3):272-5. doi: 10.1016/j.jtbi.2010.09.007. Epub 2010 Sep 8.

引用本文的文献

AAGP integrates physicochemical and compositional features for machine learning-based prediction of anti-aging peptides.AAGP整合物理化学和组成特征，用于基于机器学习的抗衰肽预测。

Sci Rep. 2025 Aug 8;15(1):29036. doi: 10.1038/s41598-025-12759-0.

A privacy-preserving approach for cloud-based protein fold recognition.一种基于云的蛋白质折叠识别的隐私保护方法。

Patterns (N Y). 2024 Jul 19;5(9):101023. doi: 10.1016/j.patter.2024.101023. eCollection 2024 Sep 13.

ResNetKhib: a novel cell type-specific tool for predicting lysine 2-hydroxyisobutylation sites via transfer learning.ResNetKhib：一种通过迁移学习预测赖氨酸 2-羟基异丁酰化位点的新型细胞类型特异性工具。

Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad063.

Fuzzy spherical truncation-based multi-linear protein descriptors: From their definition to application in structural-related predictions.基于模糊球形截断的多线性蛋白质描述符：从定义到在结构相关预测中的应用

Front Chem. 2022 Oct 7;10:959143. doi: 10.3389/fchem.2022.959143. eCollection 2022.

BBPpredict: A Web Service for Identifying Blood-Brain Barrier Penetrating Peptides.BBPpredict：一种用于识别血脑屏障穿透肽的网络服务。

Front Genet. 2022 May 17;13:845747. doi: 10.3389/fgene.2022.845747. eCollection 2022.

Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences.使用递归特征选择和随机森林提高低相似度序列的蛋白质结构分类预测。

Comput Math Methods Med. 2021 May 7;2021:5529389. doi: 10.1155/2021/5529389. eCollection 2021.

Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model.基于序列信息的蛋白质琥珀酰化修饰位点预测的 IFS-LightGBM（BO）模型

Comput Math Methods Med. 2020 Nov 10;2020:8858489. doi: 10.1155/2020/8858489. eCollection 2020.

Identifying Antioxidant Proteins by Using Amino Acid Composition and Protein-Protein Interactions.利用氨基酸组成和蛋白质-蛋白质相互作用鉴定抗氧化蛋白

Front Cell Dev Biol. 2020 Oct 29;8:591487. doi: 10.3389/fcell.2020.591487. eCollection 2020.

Prediction of protein structural classes by different feature expressions based on 2-D wavelet denoising and fusion.基于二维小波去噪和融合的不同特征表达预测蛋白质结构类别。

BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):701. doi: 10.1186/s12859-019-3276-5.

BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches.BioSeq-Analysis2.0：一个基于机器学习方法的更新平台，用于在序列水平和残基水平上分析 DNA、RNA 和蛋白质序列。

Nucleic Acids Res. 2019 Nov 18;47(20):e127. doi: 10.1093/nar/gkz740.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用基于新型进化搭配的序列表示法预测蛋白质结构类别。

Prediction of protein structural class using novel evolutionary collocation-based sequence representation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献