一种将特征提取方法与不同分类器集成相结合的方法，用于蛋白质结构类别预测问题。

A combination of feature extraction methods with an ensemble of different classifiers for protein structural class prediction problem.

作者信息

Dehzangi Abdollah, Paliwal Kuldip, Sharma Alok, Dehzangi Omid, Sattar Abdul

机构信息

Griffith University, and National ICT Australia (NICTA), Brisbane.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):564-75. doi: 10.1109/TCBB.2013.65.

DOI:10.1109/TCBB.2013.65

PMID:24091391

Abstract

Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.

摘要

更好地理解给定蛋白质的结构类别可以揭示有关其整体折叠类型及其结构域的重要信息。它还可以直接用于提供有关蛋白质一般三级结构的关键信息，这对蛋白质功能测定和药物设计具有深远影响。尽管基于模式识别的方法在解决这个问题上取得了巨大进展，但它仍然是生物信息学中一个未解决的问题，需要更多的关注和探索。在本研究中，我们提出了一种新颖的特征提取模型，该模型同时整合了基于物理化学和进化的信息。我们还提出了基于重叠分段分布和自相关的特征提取方法，以提供更多的局部和全局判别信息。从广泛的基于物理化学的属性中选择了15个最有前景的属性，对所提出的特征提取方法进行了探索。最后，通过应用不同分类器的集成，即Adaboost.M1、LogitBoost、朴素贝叶斯、多层感知器（MLP）和支持向量机（SVM），我们展示了在四个流行基准上蛋白质结构类别预测准确性的提高。

相似文献

A combination of feature extraction methods with an ensemble of different classifiers for protein structural class prediction problem.一种将特征提取方法与不同分类器集成相结合的方法，用于蛋白质结构类别预测问题。

IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):564-75. doi: 10.1109/TCBB.2013.65.

A Segmentation-Based Method to Extract Structural and Evolutionary Features for Protein Fold Recognition.一种基于分割的蛋白质折叠识别结构和进化特征提取方法。

IEEE/ACM Trans Comput Biol Bioinform. 2014 May-Jun;11(3):510-9. doi: 10.1109/TCBB.2013.2296317.

Proposing a highly accurate protein structural class predictor using segmentation-based features.提出一种基于分段特征的高精度蛋白质结构类预测器。

BMC Genomics. 2014;15 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2164-15-S1-S2. Epub 2014 Jan 24.

Predicting protein structural class by SVM with class-wise optimized features and decision probabilities.使用具有类别优化特征和决策概率的支持向量机预测蛋白质结构类别。

J Theor Biol. 2008 Jul 21;253(2):375-80. doi: 10.1016/j.jtbi.2008.02.031. Epub 2008 Mar 4.

A mixture of physicochemical and evolutionary-based feature extraction approaches for protein fold recognition.一种用于蛋白质折叠识别的物理化学和基于进化的特征提取方法的混合方法。

Int J Data Min Bioinform. 2015;11(1):115-38. doi: 10.1504/ijdmb.2015.066359.

Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition.Mito-GSAAC：基于遗传集成分类器和分裂氨基酸组成的线粒体预测。

Amino Acids. 2012 Apr;42(4):1443-54. doi: 10.1007/s00726-011-0888-0. Epub 2011 Mar 29.

Prediction of protein folds: extraction of new features, dimensionality reduction, and fusion of heterogeneous classifiers.蛋白质折叠预测：新特征提取、降维及异构分类器融合

IEEE Trans Nanobioscience. 2009 Mar;8(1):100-10. doi: 10.1109/TNB.2009.2016488. Epub 2009 Mar 10.

Protein subcellular multi-localization prediction using a min-max modular support vector machine.基于最小-最大模块化支持向量机的蛋白质亚细胞多定位预测

Int J Neural Syst. 2010 Feb;20(1):13-28. doi: 10.1142/S0129065710002206.

Advancing the Accuracy of Protein Fold Recognition by Utilizing Profiles From Hidden Markov Models.利用隐马尔可夫模型的轮廓提高蛋白质折叠识别的准确性

IEEE Trans Nanobioscience. 2015 Oct;14(7):761-72. doi: 10.1109/TNB.2015.2457906. Epub 2015 Jul 20.

Predicting protein-protein interactions between human and hepatitis C virus via an ensemble learning method.通过集成学习方法预测人类与丙型肝炎病毒之间的蛋白质-蛋白质相互作用。

Mol Biosyst. 2014 Dec;10(12):3147-54. doi: 10.1039/c4mb00410h. Epub 2014 Sep 18.

引用本文的文献

Mal-Light: Enhancing Lysine Malonylation Sites Prediction Problem Using Evolutionary-based Features.Mal-Light：利用基于进化的特征增强赖氨酸丙二酰化位点预测问题

IEEE Access. 2020;8:77888-77902. doi: 10.1109/access.2020.2989713. Epub 2020 Apr 22.

Accurate prediction of RNA 5-hydroxymethylcytosine modification by utilizing novel position-specific gapped k-mer descriptors.利用新型位置特异性间隔k-mer描述符准确预测RNA 5-羟甲基胞嘧啶修饰

Comput Struct Biotechnol J. 2020 Nov 12;18:3528-3538. doi: 10.1016/j.csbj.2020.10.032. eCollection 2020.

Accurately Predicting Glutarylation Sites Using Sequential Bi-Peptide-Based Evolutionary Features.基于序贯双肽进化特征准确预测谷氨酰化位点。

Genes (Basel). 2020 Aug 31;11(9):1023. doi: 10.3390/genes11091023.

Prediction of protein structural classes by different feature expressions based on 2-D wavelet denoising and fusion.基于二维小波去噪和融合的不同特征表达预测蛋白质结构类别。

BMC Bioinformatics. 2019 Dec 24;20(Suppl 25):701. doi: 10.1186/s12859-019-3276-5.

Identification of Phage Viral Proteins With Hybrid Sequence Features.具有杂交序列特征的噬菌体病毒蛋白的鉴定

Front Microbiol. 2019 Mar 26;10:507. doi: 10.3389/fmicb.2019.00507. eCollection 2019.

SumSec: Accurate Prediction of Sumoylation Sites Using Predicted Secondary Structure.SumSec：利用预测的二级结构准确预测类泛素化位点

Molecules. 2018 Dec 10;23(12):3260. doi: 10.3390/molecules23123260.

Improving succinylation prediction accuracy by incorporating the secondary structure via helix, strand and coil, and evolutionary information from profile bigrams.通过纳入螺旋、链和卷曲的二级结构以及来自轮廓双字母组的进化信息来提高琥珀酰化预测准确性。

PLoS One. 2018 Feb 12;13(2):e0191900. doi: 10.1371/journal.pone.0191900. eCollection 2018.

A unified frame of predicting side effects of drugs by using linear neighborhood similarity.一种利用线性邻域相似性预测药物副作用的统一框架。

BMC Syst Biol. 2017 Dec 14;11(Suppl 6):101. doi: 10.1186/s12918-017-0477-2.

Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach.基于带间隙二肽和递归特征选择方法的蛋白质结构类预测

Int J Mol Sci. 2015 Dec 24;17(1):15. doi: 10.3390/ijms17010015.

Customised fragments libraries for protein structure prediction based on structural class annotations.基于结构类注释的用于蛋白质结构预测的定制片段文库。

BMC Bioinformatics. 2015 Apr 29;16(1):136. doi: 10.1186/s12859-015-0576-2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种将特征提取方法与不同分类器集成相结合的方法，用于蛋白质结构类别预测问题。

A combination of feature extraction methods with an ensemble of different classifiers for protein structural class prediction problem.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献