一种使用自动交叉协方差变换和递归特征消除的高精度蛋白质结构类别预测方法。

A highly accurate protein structural class prediction approach using auto cross covariance transformation and recursive feature elimination.

作者信息

Li Xiaowei, Liu Taigang, Tao Peiying, Wang Chunhua, Chen Lanming

机构信息

College of Food Science & Technology, Shanghai Ocean University, Shanghai 201306, China.

College of Information Technology, Shanghai Ocean University, Shanghai 201306, China.

出版信息

Comput Biol Chem. 2015 Dec;59 Pt A:95-100. doi: 10.1016/j.compbiolchem.2015.08.012. Epub 2015 Sep 2.

DOI:10.1016/j.compbiolchem.2015.08.012

PMID:26460680

Abstract

Structural class characterizes the overall folding type of a protein or its domain. Many methods have been proposed to improve the prediction accuracy of protein structural class in recent years, but it is still a challenge for the low-similarity sequences. In this study, we introduce a feature extraction technique based on auto cross covariance (ACC) transformation of position-specific score matrix (PSSM) to represent a protein sequence. Then support vector machine-recursive feature elimination (SVM-RFE) is adopted to select top K features according to their importance and these features are input to a support vector machine (SVM) to conduct the prediction. Performance evaluation of the proposed method is performed using the jackknife test on three low-similarity datasets, i.e., D640, 1189 and 25PDB. By means of this method, the overall accuracies of 97.2%, 96.2%, and 93.3% are achieved on these three datasets, which are higher than those of most existing methods. This suggests that the proposed method could serve as a very cost-effective tool for predicting protein structural class especially for low-similarity datasets.

摘要

结构类别表征蛋白质或其结构域的整体折叠类型。近年来，人们提出了许多方法来提高蛋白质结构类别的预测准确性，但对于低相似性序列来说，这仍然是一个挑战。在本研究中，我们引入了一种基于位置特异性得分矩阵（PSSM）的自协方差（ACC）变换的特征提取技术来表示蛋白质序列。然后采用支持向量机递归特征消除（SVM-RFE）根据特征的重要性选择前K个特征，并将这些特征输入支持向量机（SVM）进行预测。使用留一法在三个低相似性数据集（即D640、1189和25PDB）上对所提出的方法进行性能评估。通过这种方法，在这三个数据集上分别达到了97.2%、96.2%和93.3%的总体准确率，高于大多数现有方法。这表明所提出的方法可以作为一种非常经济高效的工具来预测蛋白质结构类别，特别是对于低相似性数据集。

相似文献

A highly accurate protein structural class prediction approach using auto cross covariance transformation and recursive feature elimination.一种使用自动交叉协方差变换和递归特征消除的高精度蛋白质结构类别预测方法。

Comput Biol Chem. 2015 Dec;59 Pt A:95-100. doi: 10.1016/j.compbiolchem.2015.08.012. Epub 2015 Sep 2.

Prediction of protein structural class using tri-gram probabilities of position-specific scoring matrix and recursive feature elimination.利用位置特异性评分矩阵的三元概率和递归特征消除预测蛋白质结构类别。

Amino Acids. 2015 Mar;47(3):461-8. doi: 10.1007/s00726-014-1878-9. Epub 2015 Jan 13.

Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles.使用 PSI-BLAST -profile 的自协方差变换准确预测蛋白质结构类别。

Amino Acids. 2012 Jun;42(6):2243-9. doi: 10.1007/s00726-011-0964-5. Epub 2011 Jun 23.

Prediction of protein structural class for low-similarity sequences using support vector machine and PSI-BLAST profile.使用支持向量机和 PSI-BLAST 轮廓预测低相似度序列的蛋白质结构类别。

Biochimie. 2010 Oct;92(10):1330-4. doi: 10.1016/j.biochi.2010.06.013. Epub 2010 Jun 23.

PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.PSSP-RFE：通过从PSI-BLAST序列谱、物理化学性质和功能注释中进行递归特征提取来准确预测蛋白质结构类别。

PLoS One. 2014 Mar 27;9(3):e92863. doi: 10.1371/journal.pone.0092863. eCollection 2014.

Improving the prediction accuracy of protein structural class: approached with alternating word frequency and normalized Lempel-Ziv complexity.提高蛋白质结构类别的预测准确性：采用交替词频和归一化莱姆尔-齐夫复杂度的方法。

J Theor Biol. 2014 Jan 21;341:71-7. doi: 10.1016/j.jtbi.2013.10.002. Epub 2013 Oct 17.

Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM.基于一致序列和分段位置特异性得分矩阵预测低相似性序列的蛋白质结构类别

Comput Math Methods Med. 2015;2015:370756. doi: 10.1155/2015/370756. Epub 2015 Dec 15.

A protein structural classes prediction method based on PSI-BLAST profile.一种基于PSI-BLAST序列谱的蛋白质结构类预测方法。

J Theor Biol. 2014 Jul 21;353:19-23. doi: 10.1016/j.jtbi.2014.02.034. Epub 2014 Mar 4.

Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach.基于带间隙二肽和递归特征选择方法的蛋白质结构类预测

Int J Mol Sci. 2015 Dec 24;17(1):15. doi: 10.3390/ijms17010015.

Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM.基于 PSSM 利用主成分分析和支持向量机预测低相似度序列的蛋白质结构类别

J Biomol Struct Dyn. 2012;29(6):634-42. doi: 10.1080/07391102.2011.672627.

引用本文的文献

DeepmRNALoc: A Novel Predictor of Eukaryotic mRNA Subcellular Localization Based on Deep Learning.DeepmRNALoc：基于深度学习的真核 mRNA 亚细胞定位新预测因子。

Molecules. 2023 Mar 1;28(5):2284. doi: 10.3390/molecules28052284.

Fused-Filament Fabrication of Short Carbon Fiber-Reinforced Polyamide: Parameter Optimization for Improved Performance under Uniaxial Tensile Loading.短碳纤维增强聚酰胺的熔丝制造：单轴拉伸载荷下提高性能的参数优化

Polymers (Basel). 2022 Mar 23;14(7):1292. doi: 10.3390/polym14071292.

Accurate Identification of Antioxidant Proteins Based on a Combination of Machine Learning Techniques and Hidden Markov Model Profiles.基于机器学习技术和隐马尔可夫模型谱的抗氧化蛋白的准确识别。

Comput Math Methods Med. 2021 Aug 7;2021:5770981. doi: 10.1155/2021/5770981. eCollection 2021.

HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection.HMMPred：基于 HMM 轮廓和 XGBoost 特征选择的 DNA 结合蛋白精确预测。

Comput Math Methods Med. 2020 Mar 28;2020:1384749. doi: 10.1155/2020/1384749. eCollection 2020.

ProTstab - predictor for cellular protein stability.ProTstab - 细胞蛋白质稳定性预测工具

BMC Genomics. 2019 Nov 4;20(1):804. doi: 10.1186/s12864-019-6138-7.

Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE.用于在随机森林-递归特征消除中自动确定最优特征子集的决策变体

Genes (Basel). 2018 Jun 15;9(6):301. doi: 10.3390/genes9060301.

iAPSL-IF: Identification of Apoptosis Protein Subcellular Location Using Integrative Features Captured from Amino Acid Sequences.iAPSL-IF：利用从氨基酸序列中提取的综合特征识别细胞凋亡蛋白亚细胞定位。

Int J Mol Sci. 2018 Apr 13;19(4):1190. doi: 10.3390/ijms19041190.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种使用自动交叉协方差变换和递归特征消除的高精度蛋白质结构类别预测方法。

A highly accurate protein structural class prediction approach using auto cross covariance transformation and recursive feature elimination.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献