Cserzö Miklos, Eisenhaber Frank, Eisenhaber Birgit, Simon Istvan
University of Birmingham, School of Biosciences, Edgbaston, Birmingham B15 2TT, UK.
Protein Eng. 2002 Sep;15(9):745-52. doi: 10.1093/protein/15.9.745.
While helical transmembrane (TM) region prediction tools achieve high (>90%) success rates for real integral membrane proteins, they produce a considerable number of false positive hits in sequences of known nontransmembrane queries. We propose a modification of the dense alignment surface (DAS) method that achieves a substantial decrease in the false positive error rate. Essentially, a sequence that includes possible transmembrane regions is compared in a second step with TM segments in a sequence library of documented transmembrane proteins. If the performance of the query sequence against the library of documented TM segment-containing sequences in this test is lower than an empirical threshold, it is classified as a non-transmembrane protein. The probability of false positive prediction for trusted TM region hits is expressed in terms of E-values. The modified DAS method, the DAS-TMfilter algorithm, has an unchanged high sensitivity for TM segments ( approximately 95% detected in a learning set of 128 documented transmembrane proteins). At the same time, the selectivity measured over a non-redundant set of 526 soluble proteins with known 3D structure is approximately 99%, mainly because a large number of falsely predicted single membrane-pass proteins are eliminated by the DAS-TMfilter algorithm.
虽然螺旋跨膜(TM)区域预测工具对真正的整合膜蛋白能达到较高(>90%)的成功率,但它们在已知非跨膜查询序列中会产生相当数量的假阳性命中结果。我们提出了一种对密集比对表面(DAS)方法的改进,可大幅降低假阳性错误率。本质上,在第二步中,将包含可能跨膜区域的序列与已记录跨膜蛋白序列库中的TM片段进行比较。如果在该测试中查询序列针对已记录的含TM片段序列库的性能低于经验阈值,则将其分类为非跨膜蛋白。对于可信的TM区域命中结果,假阳性预测的概率用E值表示。改进后的DAS方法,即DAS-TMfilter算法,对TM片段具有不变的高灵敏度(在128个已记录跨膜蛋白的学习集中约95%被检测到)。同时,在一组具有已知三维结构的526个可溶性蛋白的非冗余集上测量的选择性约为99%,这主要是因为DAS-TMfilter算法消除了大量错误预测的单跨膜蛋白。