Suppr超能文献

基于遗传算法-偏最小二乘法和支持向量机的线粒体蛋白质预测

Prediction of mitochondrial proteins based on genetic algorithm - partial least squares and support vector machine.

作者信息

Tan F, Feng X, Fang Z, Li M, Guo Y, Jiang L

机构信息

College of Chemistry, Sichuan University, Chengdu, China.

出版信息

Amino Acids. 2007 Nov;33(4):669-75. doi: 10.1007/s00726-006-0465-0. Epub 2007 Aug 15.

Abstract

Mitochondria are essential cell organelles of eukaryotes. Hence, it is vitally important to develop an automated and reliable method for timely identification of novel mitochondrial proteins. In this study, mitochondrial proteins were encoded by dipeptide composition technology; then, the genetic algorithm-partial least square (GA-PLS) method was used to evaluate the dipeptide composition elements which are more important in recognizing mitochondrial proteins; further, these selected dipeptide composition elements were applied to support vector machine (SVM)-based classifiers to predict the mitochondrial proteins. All the models were trained and validated by the jackknife cross-validation test. The prediction accuracy is 85%, suggesting that it performs reasonably well in predicting the mitochondrial proteins. Our results strongly imply that not all the dipeptide compositions are informative and indispensable for predicting proteins. The source code of MATLAB and the dataset are available on request under liml@scu.edu.cn.

摘要

线粒体是真核生物必不可少的细胞器。因此,开发一种自动化且可靠的方法以及时鉴定新型线粒体蛋白至关重要。在本研究中,线粒体蛋白通过二肽组成技术进行编码;然后,采用遗传算法-偏最小二乘法(GA-PLS)评估在识别线粒体蛋白中更重要的二肽组成元素;此外,将这些选定的二肽组成元素应用于基于支持向量机(SVM)的分类器以预测线粒体蛋白。所有模型均通过留一法交叉验证测试进行训练和验证。预测准确率为85%,表明其在预测线粒体蛋白方面表现良好。我们的结果强烈表明,并非所有二肽组成对于预测蛋白质都是信息丰富且不可或缺的。MATLAB源代码和数据集可通过发送邮件至liml@scu.edu.cn索取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验