通过支持向量机方法从蛋白质序列预测转运蛋白家族。

Prediction of transporter family from protein sequence by support vector machine approach.

作者信息

Lin H H, Han L Y, Cai C Z, Ji Z L, Chen Y Z

机构信息

Bioinformatics and Drug Design Group, Department of Computational Science, National University of Singapore, Singapore.

出版信息

Proteins. 2006 Jan 1;62(1):218-31. doi: 10.1002/prot.20605.

DOI:10.1002/prot.20605

PMID:16287089

Abstract

Transporters play key roles in cellular transport and metabolic processes, and in facilitating drug delivery and excretion. These proteins are classified into families based on the transporter classification (TC) system. Determination of the TC family of transporters facilitates the study of their cellular and pharmacological functions. Methods for predicting TC family without sequence alignments or clustering are particularly useful for studying novel transporters whose function cannot be determined by sequence similarity. This work explores the use of a machine learning method, support vector machines (SVMs), for predicting the family of transporters from their sequence without the use of sequence similarity. A total of 10,636 transporters in 13 TC subclasses, 1914 transporters in eight TC families, and 168,341 nontransporter proteins are used to train and test the SVM prediction system. Testing results by using a separate set of 4351 transporters and 83,151 nontransporter proteins show that the overall accuracy for predicting members of these TC subclasses and families is 83.4% and 88.0%, respectively, and that of nonmembers is 99.3% and 96.6%, respectively. The accuracies for predicting members and nonmembers of individual TC subclasses are in the range of 70.7-96.1% and 97.6-99.9%, respectively, and those of individual TC families are in the range of 60.6-97.1% and 91.5-99.4%, respectively. A further test by using 26,139 transmembrane proteins outside each of the 13 TC subclasses shows that 90.4-99.6% of these are correctly predicted. Our study suggests that the SVM is potentially useful for facilitating functional study of transporters irrespective of sequence similarity.

摘要

转运蛋白在细胞运输和代谢过程中以及促进药物递送和排泄方面发挥着关键作用。这些蛋白质根据转运蛋白分类（TC）系统被分为不同的家族。确定转运蛋白的TC家族有助于研究其细胞功能和药理功能。对于研究那些无法通过序列相似性确定功能的新型转运蛋白而言，无需序列比对或聚类即可预测TC家族的方法尤为有用。本研究探索了使用机器学习方法——支持向量机（SVM），从转运蛋白序列中预测其家族，而不使用序列相似性。总共使用了13个TC亚类中的10,636个转运蛋白、8个TC家族中的1914个转运蛋白以及168,341个非转运蛋白来训练和测试SVM预测系统。使用一组单独的4351个转运蛋白和83,151个非转运蛋白进行测试的结果表明，预测这些TC亚类和家族成员的总体准确率分别为83.4%和88.0%，预测非成员的准确率分别为99.3%和96.6%。预测各个TC亚类成员和非成员的准确率分别在70.7 - 96.1%和97.6 - 99.9%的范围内，预测各个TC家族成员和非成员的准确率分别在60.6 - 97.1%和91.5 - 99.4%的范围内。使用13个TC亚类之外的26,139个跨膜蛋白进行的进一步测试表明，其中90.4 - 99.6%被正确预测。我们的研究表明，无论序列相似性如何，SVM对于促进转运蛋白的功能研究都可能是有用的。

相似文献

Prediction of transporter family from protein sequence by support vector machine approach.

Proteins. 2006 Jan 1;62(1):218-31. doi: 10.1002/prot.20605.

Enzyme family classification by support vector machines.

Proteins. 2004 Apr 1;55(1):66-76. doi: 10.1002/prot.20045.

Prediction of the functional class of metal-binding proteins from sequence derived physicochemical properties by support vector machine approach.

BMC Bioinformatics. 2006 Dec 18;7 Suppl 5(Suppl 5):S13. doi: 10.1186/1471-2105-7-S5-S13.

Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach.

Nucleic Acids Res. 2004 Dec 7;32(21):6437-44. doi: 10.1093/nar/gkh984. Print 2004.

Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties.

Mol Immunol. 2007 Jan;44(4):514-20. doi: 10.1016/j.molimm.2006.02.010. Epub 2006 Mar 23.

Prediction of protein subcellular localization.

Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.

Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines.

J Theor Biol. 2006 May 21;240(2):175-84. doi: 10.1016/j.jtbi.2005.09.018. Epub 2005 Nov 7.

Modelling and mutational evidence identify the substrate binding site and functional elements in APC amino acid transporters.

Mol Membr Biol. 2009 Aug;26(5):356-70. doi: 10.1080/09687680903170546. Epub 2009 Aug 7.

A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.

In Silico Biol. 2008;8(2):129-40.

Membrane topology prediction by hydropathy profile alignment: membrane topology of the Na(+)-glutamate transporter GltS.

Biochemistry. 2007 Mar 6;46(9):2326-32. doi: 10.1021/bi062275i. Epub 2007 Feb 2.

引用本文的文献

Structural and biochemical insights of xylose MFS and SWEET transporters in microbial cell factories: challenges to lignocellulosic hydrolysates fermentation.

Front Microbiol. 2024 Sep 27;15:1452240. doi: 10.3389/fmicb.2024.1452240. eCollection 2024.

Machine learning and comparative genomics approaches for the discovery of xylose transporters in yeast.

Biotechnol Biofuels Bioprod. 2022 May 20;15(1):57. doi: 10.1186/s13068-022-02153-7.

Identification of Proteins of Tobacco Mosaic Virus by Using a Method of Feature Extraction.

Front Genet. 2020 Oct 9;11:569100. doi: 10.3389/fgene.2020.569100. eCollection 2020.

TranCEP: Predicting the substrate class of transmembrane transport proteins using compositional, evolutionary, and positional information.

PLoS One. 2020 Jan 14;15(1):e0227683. doi: 10.1371/journal.pone.0227683. eCollection 2020.

An advanced approach to identify antimicrobial peptides and their function types for penaeus through machine learning strategies.

BMC Bioinformatics. 2019 Jun 10;20(Suppl 8):291. doi: 10.1186/s12859-019-2766-9.

Harnessing the evolutionary information on oxygen binding proteins through Support Vector Machines based modules.

BMC Res Notes. 2018 May 11;11(1):290. doi: 10.1186/s13104-018-3383-9.

Effective prediction of bacterial type IV secreted effectors by combined features of both C-termini and N-termini.

J Comput Aided Mol Des. 2017 Nov;31(11):1029-1038. doi: 10.1007/s10822-017-0080-z. Epub 2017 Nov 10.

Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy.

BMC Syst Biol. 2016 Dec 23;10(Suppl 4):114. doi: 10.1186/s12918-016-0353-5.

Resistance gene identification from Larimichthys crocea with machine learning techniques.

Sci Rep. 2016 Dec 6;6:38367. doi: 10.1038/srep38367.

iDPF-PseRAAAC: A Web-Server for Identifying the Defensin Peptide Family and Subfamily Using Pseudo Reduced Amino Acid Alphabet Composition.

PLoS One. 2015 Dec 29;10(12):e0145541. doi: 10.1371/journal.pone.0145541. eCollection 2015.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过支持向量机方法从蛋白质序列预测转运蛋白家族。

Prediction of transporter family from protein sequence by support vector machine approach.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献