Department of Biology, University of Padua, Viale G. Colombo 3, I-35131 Padova, Italy.
Bioinformatics. 2012 Feb 15;28(4):503-9. doi: 10.1093/bioinformatics/btr682. Epub 2011 Dec 20.
Intrinsically disordered regions are key for the function of numerous proteins, and the scant available experimental annotations suggest the existence of different disorder flavors. While efficient predictions are required to annotate entire genomes, most existing methods require sequence profiles for disorder prediction, making them cumbersome for high-throughput applications.
In this work, we present an ensemble of protein disorder predictors called ESpritz. These are based on bidirectional recursive neural networks and trained on three different flavors of disorder, including a novel NMR flexibility predictor. ESpritz can produce fast and accurate sequence-only predictions, annotating entire genomes in the order of hours on a single processor core. Alternatively, a slower but slightly more accurate ESpritz variant using sequence profiles can be used for applications requiring maximum performance. Two levels of prediction confidence allow either to maximize reasonable disorder detection or to limit expected false positives to 5%. ESpritz performs consistently well on the recent CASP9 data, reaching a S(w) measure of 54.82 and area under the receiver operator curve of 0.856. The fast predictor is four orders of magnitude faster and remains better than most publicly available CASP9 methods, making it ideal for genomic scale predictions.
ESpritz predicts three flavors of disorder at two distinct false positive rates, either with a fast or slower and slightly more accurate approach. Given its state-of-the-art performance, it can be especially useful for high-throughput applications.
Both a web server for high-throughput analysis and a Linux executable version of ESpritz are available from: http://protein.bio.unipd.it/espritz/.
无序区域是许多蛋白质功能的关键,而现有的少量实验注释表明存在不同的无序风味。虽然需要高效的预测来注释整个基因组,但大多数现有的方法都需要无序预测的序列轮廓,这使得它们在高通量应用中繁琐。
在这项工作中,我们提出了一个名为 ESpritz 的蛋白质无序预测器集合。这些基于双向递归神经网络,针对三种不同风味的无序进行训练,包括一种新的 NMR 灵活性预测器。ESpritz 可以快速准确地进行基于序列的预测,在单个处理器核心上以小时为单位注释整个基因组。或者,可以使用需要最大性能的序列轮廓的较慢但略准确的 ESpritz 变体。两种预测置信度级别允许最大限度地合理检测无序或将预期假阳性限制为 5%。ESpritz 在最近的 CASP9 数据中表现一致,达到 S(w)测量值为 54.82 和接收器操作员曲线下的面积为 0.856。快速预测器的速度快四个数量级,仍然优于大多数公开可用的 CASP9 方法,使其成为基因组规模预测的理想选择。
ESpritz 以两种不同的假阳性率预测三种风味的无序,要么采用快速方法,要么采用较慢但略准确的方法。鉴于其最先进的性能,它特别适用于高通量应用。
高速分析的网络服务器和 ESpritz 的 Linux 可执行版本都可从以下网址获得:http://protein.bio.unipd.it/espritz/。