Xue Bin, Dunbrack Roland L, Williams Robert W, Dunker A Keith, Uversky Vladimir N
Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
Biochim Biophys Acta. 2010 Apr;1804(4):996-1010. doi: 10.1016/j.bbapap.2010.01.011. Epub 2010 Jan 25.
Protein intrinsic disorder is becoming increasingly recognized in proteomics research. While lacking structure, many regions of disorder have been associated with biological function. There are many different experimental methods for characterizing intrinsically disordered proteins and regions; nevertheless, the prediction of intrinsic disorder from amino acid sequence remains a useful strategy especially for many large-scale proteomic investigations. Here we introduced a consensus artificial neural network (ANN) prediction method, which was developed by combining the outputs of several individual disorder predictors. By eight-fold cross-validation, this meta-predictor, called PONDR-FIT, was found to improve the prediction accuracy over a range of 3 to 20% with an average of 11% compared to the single predictors, depending on the datasets being used. Analysis of the errors shows that the worst accuracy still occurs for short disordered regions with less than ten residues, as well as for the residues close to order/disorder boundaries. Increased understanding of the underlying mechanism by which such meta-predictors give improved predictions will likely promote the further development of protein disorder predictors. Access to PONDR-FIT is available at www.disprot.org.
蛋白质内在无序性在蛋白质组学研究中日益受到认可。尽管缺乏结构,但许多无序区域已被证明与生物学功能相关。目前有许多不同的实验方法用于表征内在无序蛋白质和区域;然而,从氨基酸序列预测内在无序性仍然是一种有用的策略,特别是对于许多大规模蛋白质组学研究。在此,我们介绍了一种一致性人工神经网络(ANN)预测方法,它是通过组合多个个体无序预测器的输出而开发的。通过八重交叉验证发现,与单个预测器相比,这种称为PONDR-FIT的元预测器在一系列数据集中,预测准确率提高了3%至20%,平均提高了11%,具体取决于所使用的数据集。对误差的分析表明,对于长度小于十个残基的短无序区域以及靠近有序/无序边界的残基,预测准确率仍然最差。对这种元预测器能提高预测准确率的潜在机制的进一步理解,可能会推动蛋白质无序预测器的进一步发展。可在www.disprot.org获取PONDR-FIT。