Garrow Andrew G, Agnew Alison, Westhead David R
School of Biochemistry and Microbiology, University of Leeds, Leeds, LS2 9JT, UK.
BMC Bioinformatics. 2005 Mar 15;6:56. doi: 10.1186/1471-2105-6-56.
Beta-barrel transmembrane (bbtm) proteins are a functionally important and diverse group of proteins expressed in the outer membranes of bacteria (both gram negative and acid fast gram positive), mitochondria and chloroplasts. Despite recent publications describing reasonable levels of accuracy for discriminating between bbtm proteins and other proteins, screening of entire genomes remains troublesome as these molecules only constitute a small fraction of the sequences screened. Therefore, novel methods are still required capable of detecting new families of bbtm protein in diverse genomes.
We present TMB-Hunt, a program that uses a k-Nearest Neighbour (k-NN) algorithm to discriminate between bbtm and non-bbtm proteins on the basis of their amino acid composition. By including differentially weighted amino acids, evolutionary information and by calibrating the scoring, an accuracy of 92.5% was achieved, with 91% sensitivity and 93.8% positive predictive value (PPV), using a rigorous cross-validation procedure. A major advantage of this approach is that because it does not rely on beta-strand detection, it does not require resolved structures and thus larger, more representative, training sets could be used. It is therefore believed that this approach will be invaluable in complementing other, physicochemical and homology based methods. This was demonstrated by the correct reassignment of a number of proteins which other predictors failed to classify. We have used the algorithm to screen several genomes and have discussed our findings.
TMB-Hunt achieves a prediction accuracy level better than other approaches published to date. Results were significantly enhanced by use of evolutionary information and a system for calibrating k-NN scoring. Because the program uses a distinct approach to that of other discriminators and thus suffers different liabilities, we believe it will make a significant contribution to the development of a consensus approach for bbtm protein detection.
β-桶状跨膜(bbtm)蛋白是一类功能重要且多样的蛋白质,存在于细菌(革兰氏阴性菌和抗酸革兰氏阳性菌)、线粒体和叶绿体的外膜中。尽管最近有文献报道在区分bbtm蛋白和其他蛋白方面有合理的准确率,但对整个基因组进行筛选仍然很麻烦,因为这些分子仅占所筛选序列的一小部分。因此,仍需要新的方法来检测不同基因组中的bbtm蛋白新家族。
我们展示了TMB-Hunt程序,该程序使用k近邻(k-NN)算法,根据氨基酸组成来区分bbtm蛋白和非bbtm蛋白。通过纳入差异加权氨基酸、进化信息并校准评分,采用严格的交叉验证程序,准确率达到了92.5%,灵敏度为91%,阳性预测值(PPV)为93.8%。这种方法的一个主要优点是,由于它不依赖于β链检测,不需要解析的结构,因此可以使用更大、更具代表性的训练集。因此,人们认为这种方法在补充其他基于物理化学和同源性的方法方面将具有重要价值。这一点通过对一些其他预测器未能分类的蛋白质进行正确重新分类得到了证明。我们使用该算法筛选了多个基因组并讨论了我们的发现。
TMB-Hunt实现了比迄今发表的其他方法更高的预测准确率。使用进化信息和k-NN评分校准系统显著提高了结果。由于该程序采用了与其他鉴别器不同的方法,因此存在不同的局限性,我们相信它将为bbtm蛋白检测的共识方法的发展做出重大贡献。