Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, PA 19122, USA.
Proteome Sci. 2011 Oct 14;9 Suppl 1(Suppl 1):S12. doi: 10.1186/1477-5956-9-S1-S12.
Studies of intrinsically disordered proteins that lack a stable tertiary structure but still have important biological functions critically rely on computational methods that predict this property based on sequence information. Although a number of fairly successful models for prediction of protein disorder have been developed over the last decade, the quality of their predictions is limited by available cases of confirmed disorders.
To more reliably estimate protein disorder from protein sequences, an iterative algorithm is proposed that integrates predictions of multiple disorder models without relying on any protein sequences with confirmed disorder annotation. The iterative method alternately provides the maximum a posterior (MAP) estimation of disorder prediction and the maximum-likelihood (ML) estimation of quality of multiple disorder predictors. Experiments on data used at CASP7, CASP8, and CASP9 have shown the effectiveness of the proposed algorithm.
The proposed algorithm can potentially be used to predict protein disorder and provide helpful suggestions on choosing suitable disorder predictors for unknown protein sequences.
缺乏稳定三级结构但仍具有重要生物学功能的无规蛋白质的研究严重依赖于基于序列信息预测该性质的计算方法。尽管在过去十年中已经开发了许多相当成功的蛋白质无序预测模型,但它们的预测质量受到可用的已确认无序案例的限制。
为了更可靠地从蛋白质序列中估计蛋白质无序,提出了一种迭代算法,该算法无需依赖任何具有已确认无序注释的蛋白质序列,即可整合多个无序模型的预测。该迭代方法交替提供无序预测的最大后验 (MAP) 估计和多个无序预测器质量的最大似然 (ML) 估计。在 CASP7、CASP8 和 CASP9 中使用的数据上的实验表明了所提出算法的有效性。
所提出的算法可用于预测蛋白质无序,并为未知蛋白质序列选择合适的无序预测器提供有用的建议。