Biomolecular Function Discovery Division, Bioinformatics Institute (BII), Agency for Science Technology and Research (A*STAR), Matrix, Singapore.
BMC Genomics. 2010 Feb 10;11 Suppl 1(Suppl 1):S15. doi: 10.1186/1471-2164-11-S1-S15.
Algorithms designed to predict protein disorder play an important role in structural and functional genomics, as disordered regions have been reported to participate in important cellular processes. Consequently, several methods with different underlying principles for disorder prediction have been independently developed by various groups. For assessing their usability in automated workflows, we are interested in identifying parameter settings and threshold selections, under which the performance of these predictors becomes directly comparable.
First, we derived a new benchmark set that accounts for different flavours of disorder complemented with a similar amount of order annotation derived for the same protein set. We show that, using the recommended default parameters, the programs tested are producing a wide range of predictions at different levels of specificity and sensitivity. We identify settings, in which the different predictors have the same false positive rate. We assess conditions when sets of predictors can be run together to derive consensus or complementary predictions. This is useful in the framework of proteome-wide applications where high specificity is required such as in our in-house sequence analysis pipeline and the ANNIE webserver.
This work identifies parameter settings and thresholds for a selection of disorder predictors to produce comparable results at a desired level of specificity over a newly derived benchmark dataset that accounts equally for ordered and disordered regions of different lengths.
预测蛋白质无序性的算法在结构和功能基因组学中起着重要作用,因为无序区域被报道参与重要的细胞过程。因此,不同的研究小组独立开发了几种基于不同原理的无序预测方法。为了评估它们在自动化工作流程中的可用性,我们感兴趣的是确定参数设置和阈值选择,在这些设置和阈值下,这些预测器的性能可以直接进行比较。
首先,我们得到了一个新的基准数据集,该数据集考虑了不同风味的无序,同时补充了相同蛋白质集的相似数量的有序注释。我们表明,使用推荐的默认参数,测试的程序会在不同的特异性和敏感性水平上产生广泛的预测。我们确定了设置,在这些设置下,不同的预测器具有相同的假阳性率。我们评估了可以一起运行一组预测器以得出共识或互补预测的条件。在需要高特异性的全蛋白质组应用程序框架中,这是有用的,例如在我们内部的序列分析管道和 ANNIE 网络服务器中。
这项工作确定了一组无序预测器的参数设置和阈值,以在新的基准数据集上产生可比较的结果,该数据集同样考虑了不同长度的有序和无序区域,并在所需的特异性水平上产生结果。