Suppr超能文献

通过支持向量机方法从蛋白质序列预测转运蛋白家族。

Prediction of transporter family from protein sequence by support vector machine approach.

作者信息

Lin H H, Han L Y, Cai C Z, Ji Z L, Chen Y Z

机构信息

Bioinformatics and Drug Design Group, Department of Computational Science, National University of Singapore, Singapore.

出版信息

Proteins. 2006 Jan 1;62(1):218-31. doi: 10.1002/prot.20605.

Abstract

Transporters play key roles in cellular transport and metabolic processes, and in facilitating drug delivery and excretion. These proteins are classified into families based on the transporter classification (TC) system. Determination of the TC family of transporters facilitates the study of their cellular and pharmacological functions. Methods for predicting TC family without sequence alignments or clustering are particularly useful for studying novel transporters whose function cannot be determined by sequence similarity. This work explores the use of a machine learning method, support vector machines (SVMs), for predicting the family of transporters from their sequence without the use of sequence similarity. A total of 10,636 transporters in 13 TC subclasses, 1914 transporters in eight TC families, and 168,341 nontransporter proteins are used to train and test the SVM prediction system. Testing results by using a separate set of 4351 transporters and 83,151 nontransporter proteins show that the overall accuracy for predicting members of these TC subclasses and families is 83.4% and 88.0%, respectively, and that of nonmembers is 99.3% and 96.6%, respectively. The accuracies for predicting members and nonmembers of individual TC subclasses are in the range of 70.7-96.1% and 97.6-99.9%, respectively, and those of individual TC families are in the range of 60.6-97.1% and 91.5-99.4%, respectively. A further test by using 26,139 transmembrane proteins outside each of the 13 TC subclasses shows that 90.4-99.6% of these are correctly predicted. Our study suggests that the SVM is potentially useful for facilitating functional study of transporters irrespective of sequence similarity.

摘要

转运蛋白在细胞运输和代谢过程中以及促进药物递送和排泄方面发挥着关键作用。这些蛋白质根据转运蛋白分类(TC)系统被分为不同的家族。确定转运蛋白的TC家族有助于研究其细胞功能和药理功能。对于研究那些无法通过序列相似性确定功能的新型转运蛋白而言,无需序列比对或聚类即可预测TC家族的方法尤为有用。本研究探索了使用机器学习方法——支持向量机(SVM),从转运蛋白序列中预测其家族,而不使用序列相似性。总共使用了13个TC亚类中的10,636个转运蛋白、8个TC家族中的1914个转运蛋白以及168,341个非转运蛋白来训练和测试SVM预测系统。使用一组单独的4351个转运蛋白和83,151个非转运蛋白进行测试的结果表明,预测这些TC亚类和家族成员的总体准确率分别为83.4%和88.0%,预测非成员的准确率分别为99.3%和96.6%。预测各个TC亚类成员和非成员的准确率分别在70.7 - 96.1%和97.6 - 99.9%的范围内,预测各个TC家族成员和非成员的准确率分别在60.6 - 97.1%和91.5 - 99.4%的范围内。使用13个TC亚类之外的26,139个跨膜蛋白进行的进一步测试表明,其中90.4 - 99.6%被正确预测。我们的研究表明,无论序列相似性如何,SVM对于促进转运蛋白的功能研究都可能是有用的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验