Suppr超能文献

一种基于分组权重进行蛋白质编码的简化字母表集合,用于预测DNA结合蛋白。

An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins.

作者信息

Nanni Loris, Lumini Alessandra

机构信息

DEIS, IEIIT--CNR, Università di Bologna, Viale Risorgimento 2, 40136 Bologna, Italy.

出版信息

Amino Acids. 2009 Feb;36(2):167-75. doi: 10.1007/s00726-008-0044-7. Epub 2008 Feb 21.

Abstract

It is well known in the literature that an ensemble of classifiers obtains good performance with respect to that obtained by a stand-alone method. Hence, it is very important to develop ensemble methods well suited for bioinformatics data. In this work, we propose to combine the feature extraction method based on grouped weight with a set of amino-acid alphabets obtained by a Genetic Algorithm. The proposed method is applied for predicting DNA-binding proteins. As classifiers, the linear support vector machine and the radial basis function support vector machine are tested. As performance indicators, the accuracy and Matthews's correlation coefficient are reported. Matthews's correlation coefficient obtained by our ensemble method is approximately 0.97 when the jackknife cross-validation is used. This result outperforms the performance obtained in the literature using the same dataset where the features are extracted directly from the amino-acid sequence.

摘要

文献中众所周知,分类器集成相对于单独方法所获得的性能表现良好。因此,开发非常适合生物信息学数据的集成方法非常重要。在这项工作中,我们建议将基于分组权重的特征提取方法与通过遗传算法获得的一组氨基酸字母表相结合。所提出的方法用于预测DNA结合蛋白。作为分类器,测试了线性支持向量机和径向基函数支持向量机。作为性能指标,报告了准确率和马修斯相关系数。当使用留一法交叉验证时,我们的集成方法获得的马修斯相关系数约为0.97。该结果优于在使用相同数据集的文献中所获得的性能,在该文献中特征是直接从氨基酸序列中提取的。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验