Han Pengfei, Zhang Xiuzhen, Feng Zhi-Ping
School of Computer Science and IT, RMIT University, Melbourne, VIC 3001, Australia.
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S42. doi: 10.1186/1471-2105-10-S1-S42.
Intrinsically unstructured or disordered proteins are common and functionally important. Prediction of disordered regions in proteins can provide useful information for understanding protein function and for high-throughput determination of protein structures.
In this paper, algorithms are presented to predict long and short disordered regions in proteins, namely the long disordered region prediction algorithm DRaai-L and the short disordered region prediction algorithm DRaai-S. These algorithms are developed based on the Random Forest machine learning model and the profiles of amino acid indices representing various physiochemical and biochemical properties of the 20 amino acids.
Experiments on DisProt3.6 and CASP7 demonstrate that some sets of the amino acid indices have strong association with the ordered and disordered status of residues. Our algorithms based on the profiles of these amino acid indices as input features to predict disordered regions in proteins outperform that based on amino acid composition and reduced amino acid composition, and also outperform many existing algorithms. Our studies suggest that the profiles of amino acid indices combined with the Random Forest learning model is an important complementary method for pinpointing disordered regions in proteins.
内在无序蛋白很常见且在功能上很重要。预测蛋白质中的无序区域可为理解蛋白质功能以及高通量测定蛋白质结构提供有用信息。
本文提出了预测蛋白质中长无序区域和短无序区域的算法,即长无序区域预测算法DRaai-L和短无序区域预测算法DRaai-S。这些算法基于随机森林机器学习模型以及代表20种氨基酸各种物理化学和生物化学性质的氨基酸指数概况开发。
在DisProt3.6和CASP7上的实验表明,某些氨基酸指数集与残基的有序和无序状态有很强的关联。我们基于这些氨基酸指数概况作为输入特征来预测蛋白质中无序区域的算法优于基于氨基酸组成和简化氨基酸组成的算法,也优于许多现有算法。我们的研究表明,氨基酸指数概况与随机森林学习模型相结合是确定蛋白质中无序区域的一种重要补充方法。