Reddy Abhinay, Cho Jaehoon, Ling Sam, Reddy Vamsee, Shlykov Maksim, Saier Milton H
Department of Molecular Biology, University of California at San Diego, La Jolla, Calif., USA.
J Mol Microbiol Biotechnol. 2014;24(3):161-90. doi: 10.1159/000363506. Epub 2014 Jun 27.
We evaluated topological predictions for nine different programs, HMMTOP, TMHMM, SVMTOP, DAS, SOSUI, TOPCONS, PHOBIUS, MEMSAT-SVM (hereinafter referred to as MEMSAT), and SPOCTOPUS. These programs were first evaluated using four large topologically well-defined families of secondary transporters, and the three best programs were further evaluated using topologically more diverse families of channels and carriers. In the initial studies, the order of accuracy was: SPOCTOPUS > MEMSAT > HMMTOP > TOPCONS > PHOBIUS > TMHMM > SVMTOP > DAS > SOSUI. Some families, such as the Sugar Porter Family (2.A.1.1) of the Major Facilitator Superfamily (MFS; TC #2.A.1) and the Amino Acid/Polyamine/Organocation (APC) Family (TC #2.A.3), were correctly predicted with high accuracy while others, such as the Mitochondrial Carrier (MC) (TC #2.A.29) and the K(+) transporter (Trk) families (TC #2.A.38), were predicted with much lower accuracy. For small, topologically homogeneous families, SPOCTOPUS and MEMSAT were generally most reliable, while with large, more diverse superfamilies, HMMTOP often proved to have the greatest prediction accuracy. We next developed a novel program, TM-STATS, that tabulates HMMTOP, SPOCTOPUS or MEMSAT-based topological predictions for any subdivision (class, subclass, superfamily, family, subfamily, or any combination of these) of the Transporter Classification Database (TCDB; www.tcdb.org) and examined the following subclasses: α-type channel proteins (TC subclasses 1.A and 1.E), secreted pore-forming toxins (TC subclass 1.C) and secondary carriers (subclass 2.A). Histograms were generated for each of these subclasses, and the results were analyzed according to subclass, family and protein. The results provide an update of topological predictions for integral membrane transport proteins as well as guides for the development of more reliable topological prediction programs, taking family-specific characteristics into account.
我们评估了九种不同程序的拓扑预测结果,这些程序分别是HMMTOP、TMHMM、SVMTOP、DAS、SOSUI、TOPCONS、PHOBIUS、MEMSAT - SVM(以下简称为MEMSAT)和SPOCTOPUS。这些程序首先使用四个拓扑结构明确的大型次级转运蛋白家族进行评估,然后对三个最佳程序使用拓扑结构更多样化的通道和载体家族进行进一步评估。在初步研究中,准确性顺序为:SPOCTOPUS > MEMSAT > HMMTOP > TOPCONS > PHOBIUS > TMHMM > SVMTOP > DAS > SOSUI。一些家族,如主要易化子超家族(MFS;转运蛋白分类编号#2.A.1)中的糖转运蛋白家族(2.A.1.1)和氨基酸/多胺/有机阳离子(APC)家族(转运蛋白分类编号#2.A.3),能够被高精度地正确预测,而其他家族,如线粒体载体(MC)(转运蛋白分类编号#2.A.29)和钾离子转运蛋白(Trk)家族(转运蛋白分类编号#2.A.38)预测的准确性则低得多。对于小型、拓扑结构均一的家族,SPOCTOPUS和MEMSAT通常最为可靠,而对于大型、更多样化的超家族,HMMTOP往往具有最高的预测准确性。接下来,我们开发了一个新程序TM - STATS,它可以将基于HMMTOP、SPOCTOPUS或MEMSAT的拓扑预测结果制成表格,用于转运蛋白分类数据库(TCDB;www.tcdb.org)的任何细分(类别、亚类、超家族、家族、亚家族或这些的任意组合),并研究了以下亚类:α型通道蛋白(转运蛋白分类亚类1.A和1.E)、分泌型成孔毒素(转运蛋白分类亚类1.C)和次级载体(亚类2.A)。为每个亚类生成了直方图,并根据亚类、家族和蛋白质对结果进行了分析。这些结果更新了整合膜转运蛋白的拓扑预测,同时也为开发更可靠的拓扑预测程序提供了指导,其中考虑了家族特异性特征。