Suppr超能文献

一种将特征提取方法与不同分类器集成相结合的方法,用于蛋白质结构类别预测问题。

A combination of feature extraction methods with an ensemble of different classifiers for protein structural class prediction problem.

作者信息

Dehzangi Abdollah, Paliwal Kuldip, Sharma Alok, Dehzangi Omid, Sattar Abdul

机构信息

Griffith University, and National ICT Australia (NICTA), Brisbane.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):564-75. doi: 10.1109/TCBB.2013.65.

Abstract

Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.

摘要

更好地理解给定蛋白质的结构类别可以揭示有关其整体折叠类型及其结构域的重要信息。它还可以直接用于提供有关蛋白质一般三级结构的关键信息,这对蛋白质功能测定和药物设计具有深远影响。尽管基于模式识别的方法在解决这个问题上取得了巨大进展,但它仍然是生物信息学中一个未解决的问题,需要更多的关注和探索。在本研究中,我们提出了一种新颖的特征提取模型,该模型同时整合了基于物理化学和进化的信息。我们还提出了基于重叠分段分布和自相关的特征提取方法,以提供更多的局部和全局判别信息。从广泛的基于物理化学的属性中选择了15个最有前景的属性,对所提出的特征提取方法进行了探索。最后,通过应用不同分类器的集成,即Adaboost.M1、LogitBoost、朴素贝叶斯、多层感知器(MLP)和支持向量机(SVM),我们展示了在四个流行基准上蛋白质结构类别预测准确性的提高。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验