利用特征选择技术预测分枝杆菌蛋白的亚细胞定位。

Prediction of subcellular location of mycobacterial protein using feature selection techniques.

机构信息

Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, 610054, Chengdu, China.

出版信息

Mol Divers. 2010 Nov;14(4):667-71. doi: 10.1007/s11030-009-9205-1. Epub 2009 Nov 12.

DOI:10.1007/s11030-009-9205-1

PMID:19908156

Abstract

Mycobacterium tuberculosis is the primary pathogen causing tuberculosis, which is one of the most prevalent infectious diseases. The subcellular location of mycobacterial proteins can provide essential clues for proteins function research and drug discovery. Therefore, it is highly desirable to develop a computational method for fast and reliable prediction of subcellular location of mycobacterial proteins. In this study, we developed a support vector machine (SVM) based method to predict subcellular location of mycobacterial proteins. A total of 444 non-redundant mycobacterial proteins were used to train and test proposed model by using jackknife cross validation. By selecting traditional pseudo amino acid composition (PseAAC) as parameters, the overall accuracy of 83.3% was achieved. Moreover, a feature selection technique was developed to find out an optimal amount of PseAAC for improving predictive performance. The optimal amount of PseAAC improved overall accuracy from 83.3 to 87.2%. In addition, the reduced amino acids in N-terminus and non N-terminus of proteins were combined in models for further improving predictive successful rate. As a result, the maximum overall accuracy of 91.2% was achieved with average accuracy of 79.7%. The proposed model provides highly useful information for further experimental research. The prediction model can be accessed free of charge at http://cobi.uestc.edu.cn/cobi/people/hlin/webserver.

摘要

结核分枝杆菌是引起结核病的主要病原体，结核病是最常见的传染病之一。分枝杆菌蛋白的亚细胞定位可以为蛋白质功能研究和药物发现提供重要线索。因此，开发一种快速可靠的预测分枝杆菌蛋白亚细胞定位的计算方法是非常理想的。在这项研究中，我们开发了一种基于支持向量机（SVM）的方法来预测分枝杆菌蛋白的亚细胞定位。通过使用 Jackknife 交叉验证，使用 444 个非冗余分枝杆菌蛋白来训练和测试所提出的模型。通过选择传统的伪氨基酸组成（PseAAC）作为参数，获得了 83.3%的总体准确性。此外，还开发了一种特征选择技术，以找到改善预测性能的最佳 PseAAC 数量。最佳 PseAAC 数量将总体准确性从 83.3%提高到了 87.2%。此外，还将蛋白质 N 端和非 N 端的减少氨基酸组合到模型中，以进一步提高预测成功率。结果，获得了 91.2%的最大总体准确性，平均准确性为 79.7%。该预测模型为进一步的实验研究提供了非常有用的信息。预测模型可在 http://cobi.uestc.edu.cn/cobi/people/hlin/webserver 上免费获取。

相似文献

Prediction of subcellular location of mycobacterial protein using feature selection techniques.利用特征选择技术预测分枝杆菌蛋白的亚细胞定位。

Mol Divers. 2010 Nov;14(4):667-71. doi: 10.1007/s11030-009-9205-1. Epub 2009 Nov 12.

Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition.利用周氏伪氨基酸组成预测分枝杆菌蛋白质的亚细胞定位

Protein Pept Lett. 2008;15(7):739-44. doi: 10.2174/092986608785133681.

Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs.基于支持向量机的方法，利用进化信息和基序预测分枝杆菌蛋白质的亚细胞定位

BMC Bioinformatics. 2007 Sep 13;8:337. doi: 10.1186/1471-2105-8-337.

Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition.通过将最优三肽纳入伪氨基酸组成的一般形式来预测分枝杆菌蛋白质的亚细胞定位。

Mol Biosyst. 2015 Feb;11(2):558-63. doi: 10.1039/c4mb00645c. Epub 2014 Dec 1.

Identification of voltage-gated potassium channel subfamilies from sequence information using support vector machine.使用支持向量机从序列信息中鉴定电压门控钾通道亚家族。

Comput Biol Med. 2012 Apr;42(4):504-7. doi: 10.1016/j.compbiomed.2012.01.003. Epub 2012 Jan 31.

Predict mycobacterial proteins subcellular locations by incorporating pseudo-average chemical shift into the general form of Chou's pseudo amino acid composition.通过将伪平均化学位移纳入周的伪氨基酸组成的通用形式，预测分枝杆菌蛋白质的亚细胞位置。

J Theor Biol. 2012 Jul 7;304:88-95. doi: 10.1016/j.jtbi.2012.03.017. Epub 2012 Mar 23.

Prediction of protein subcellular locations using a new measure of information discrepancy.使用一种新的信息差异度量来预测蛋白质亚细胞定位。

J Bioinform Comput Biol. 2005 Aug;3(4):915-27. doi: 10.1142/s0219720005001399.

Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition.基于伪氨基酸组成的二肽模式预测离子通道及其类型。

J Theor Biol. 2011 Jan 21;269(1):64-9. doi: 10.1016/j.jtbi.2010.10.019. Epub 2010 Oct 20.

Prediction of protein subcellular localization with oversampling approach and Chou's general PseAAC.基于过采样方法和周式广义伪氨基酸组成预测蛋白质亚细胞定位

J Theor Biol. 2018 Jan 21;437:239-250. doi: 10.1016/j.jtbi.2017.10.030. Epub 2017 Oct 31.

Identification of the subcellular localization of mycobacterial proteins using localization motifs.利用定位基序鉴定分枝杆菌蛋白的亚细胞定位。

Biochimie. 2012 Mar;94(3):847-53. doi: 10.1016/j.biochi.2011.12.003. Epub 2011 Dec 11.

引用本文的文献

Recent development of machine learning-based methods for the prediction of defensin family and subfamily.基于机器学习的防御素家族和亚家族预测方法的最新进展。

EXCLI J. 2022 May 5;21:757-771. doi: 10.17179/excli2022-4913. eCollection 2022.

iRSpot-Pse6NC: Identifying recombination spots in by incorporating hexamer composition into general PseKNC.iRSpot-Pse6NC：通过将六聚体组成纳入通用 PseKNC 来识别中的重组热点。

Int J Biol Sci. 2018 May 22;14(8):883-891. doi: 10.7150/ijbs.24616. eCollection 2018.

Prediction of subcellular location of apoptosis proteins by incorporating PsePSSM and DCCA coefficient based on LFDA dimensionality reduction.基于 LFDA 降维的 PsePSSM 和 DCCA 系数融合预测细胞凋亡蛋白的亚细胞定位。

BMC Genomics. 2018 Jun 19;19(1):478. doi: 10.1186/s12864-018-4849-9.

A novel feature ranking method for prediction of cancer stages using proteomics data.一种利用蛋白质组学数据预测癌症分期的新型特征排序方法。

PLoS One. 2017 Sep 21;12(9):e0184203. doi: 10.1371/journal.pone.0184203. eCollection 2017.

Predicting the Types of Ion Channel-Targeted Conotoxins Based on AVC-SVM Model.基于AVC-SVM模型预测离子通道靶向芋螺毒素的类型

Biomed Res Int. 2017;2017:2929807. doi: 10.1155/2017/2929807. Epub 2017 Apr 9.

High-Performance Prediction of Human Estrogen Receptor Agonists Based on Chemical Structures.基于化学结构的人类雌激素受体激动剂的高性能预测

Molecules. 2017 Apr 23;22(4):675. doi: 10.3390/molecules22040675.

Prediction of aptamer-protein interacting pairs using an ensemble classifier in combination with various protein sequence attributes.使用集成分类器结合各种蛋白质序列属性预测适配体-蛋白质相互作用对。

BMC Bioinformatics. 2016 May 31;17(1):225. doi: 10.1186/s12859-016-1087-5.

Survey of Natural Language Processing Techniques in Bioinformatics.生物信息学中的自然语言处理技术综述

Comput Math Methods Med. 2015;2015:674296. doi: 10.1155/2015/674296. Epub 2015 Oct 7.

Metabolomic Profiling of Plasma from Patients with Tuberculosis by Use of Untargeted Mass Spectrometry Reveals Novel Biomarkers for Diagnosis.利用非靶向质谱法对肺结核患者血浆进行代谢组学分析揭示了用于诊断的新型生物标志物

J Clin Microbiol. 2015 Dec;53(12):3750-9. doi: 10.1128/JCM.01568-15. Epub 2015 Sep 16.

Identification of specific metabolites in culture supernatant of Mycobacterium tuberculosis using metabolomics: exploration of potential biomarkers.运用代谢组学技术鉴定结核分枝杆菌培养上清液中的特定代谢产物：探索潜在生物标志物

Emerg Microbes Infect. 2015 Jan;4(1):e6. doi: 10.1038/emi.2015.6. Epub 2015 Jan 28.

本文引用的文献

Using the nonlinear dimensionality reduction method for the prediction of subcellular localization of Gram-negative bacterial proteins.使用非线性降维方法预测革兰氏阴性细菌蛋白质的亚细胞定位。

Mol Divers. 2009 Nov;13(4):475-81. doi: 10.1007/s11030-009-9134-z. Epub 2009 Mar 28.

Predicting antimicrobial drugs and targets with the MARCH-INSIDE approach.使用MARCH-INSIDE方法预测抗菌药物和靶点。

Curr Top Med Chem. 2008;8(18):1676-90. doi: 10.2174/156802608786786543.

Alignment-free prediction of mycobacterial DNA promoters based on pseudo-folding lattice network or star-graph topological indices.基于伪折叠晶格网络或星图拓扑指数的分枝杆菌DNA启动子无比对预测。

J Theor Biol. 2009 Feb 7;256(3):458-66. doi: 10.1016/j.jtbi.2008.09.035. Epub 2008 Oct 17.

Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition.利用周氏伪氨基酸组成预测分枝杆菌蛋白质的亚细胞定位

Protein Pept Lett. 2008;15(7):739-44. doi: 10.2174/092986608785133681.

Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins.使用AdaBoost预测原核生物和真核生物蛋白质的亚细胞定位。

Mol Divers. 2008 Feb;12(1):41-5. doi: 10.1007/s11030-008-9073-0. Epub 2008 May 28.

Proteomics, networks and connectivity indices.蛋白质组学、网络与连通性指数。

Proteomics. 2008 Feb;8(4):750-78. doi: 10.1002/pmic.200700638.

Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms.Cell-PLoc：用于预测多种生物体中蛋白质亚细胞定位的一组网络服务器程序包。

Nat Protoc. 2008;3(2):153-62. doi: 10.1038/nprot.2007.494.

PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition.PseAAC：一个用于生成各种蛋白质伪氨基酸组成的灵活网络服务器。

Anal Biochem. 2008 Feb 15;373(2):386-8. doi: 10.1016/j.ab.2007.10.012. Epub 2007 Oct 13.

BMC Bioinformatics. 2007 Sep 13;8:337. doi: 10.1186/1471-2105-8-337.

Recent progress in protein subcellular location prediction.蛋白质亚细胞定位预测的最新进展。

Anal Biochem. 2007 Nov 1;370(1):1-16. doi: 10.1016/j.ab.2007.07.006. Epub 2007 Jul 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用特征选择技术预测分枝杆菌蛋白的亚细胞定位。

Prediction of subcellular location of mycobacterial protein using feature selection techniques.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献