Li Haiquan, Dai Xinbin, Zhao Xuechun
Bioinformatics Lab, Plant Biology Division, The Samuel Roberts Noble Foundation, Inc., 2510 Sam Noble Parkway, Ardmore, OK 73401, USA.
Bioinformatics. 2008 May 1;24(9):1129-36. doi: 10.1093/bioinformatics/btn099. Epub 2008 Mar 12.
Membrane transport proteins play a crucial role in the import and export of ions, small molecules or macromolecules across biological membranes. Currently, there are a limited number of published computational tools which enable the systematic discovery and categorization of transporters prior to costly experimental validation. To approach this problem, we utilized a nearest neighbor method which seamlessly integrates homologous search and topological analysis into a machine-learning framework.
Our approach satisfactorily distinguished 484 transporter families in the Transporter Classification Database, a curated and representative database for transporters. A five-fold cross-validation on the database achieved a positive classification rate of 72.3% on average. Furthermore, this method successfully detected transporters in seven model and four non-model organisms, ranging from archaean to mammalian species. A preliminary literature-based validation has cross-validated 65.8% of our predictions on the 11 organisms, including 55.9% of our predictions overlapping with 83.6% of the predicted transporters in TransportDB.
膜转运蛋白在离子、小分子或大分子跨生物膜的进出过程中起着至关重要的作用。目前,已发表的能够在进行成本高昂的实验验证之前对转运蛋白进行系统发现和分类的计算工具数量有限。为了解决这个问题,我们采用了一种最近邻方法,该方法将同源搜索和拓扑分析无缝集成到一个机器学习框架中。
我们的方法令人满意地在转运蛋白分类数据库(一个经过整理且具有代表性的转运蛋白数据库)中区分出484个转运蛋白家族。在该数据库上进行的五折交叉验证平均阳性分类率达到72.3%。此外,该方法成功地在从古生菌到哺乳动物物种的7种模式生物和4种非模式生物中检测到了转运蛋白。基于文献的初步验证对我们在11种生物上的预测进行了交叉验证,其中65.8%的预测得到验证,包括我们55.9%的预测与TransportDB中83.6%的预测转运蛋白重叠。