Malhis Nawar, Wong Eric T C, Nassar Roy, Gsponer Jörg
Centre for High-Throughput Biology, University of British Columbia, Vancouver, BC, Canada.
Centre for High-Throughput Biology, University of British Columbia, Vancouver, BC, Canada; Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada.
PLoS One. 2015 Oct 30;10(10):e0141603. doi: 10.1371/journal.pone.0141603. eCollection 2015.
Intrinsically disordered regions of proteins play an essential role in the regulation of various biological processes. Key to their regulatory function is often the binding to globular protein domains via sequence elements known as molecular recognition features (MoRFs). Development of computational tools for the identification of candidate MoRF locations in amino acid sequences is an important task and an area of growing interest. Given the relative sparseness of MoRFs in protein sequences, the accuracy of the available MoRF predictors is often inadequate for practical usage, which leaves a significant need and room for improvement. In this work, we introduce MoRFCHiBi_Web, which predicts MoRF locations in protein sequences with higher accuracy compared to current MoRF predictors.
Three distinct and largely independent property scores are computed with component predictors and then combined to generate the final MoRF propensity scores. The first score reflects the likelihood of sequence windows to harbour MoRFs and is based on amino acid composition and sequence similarity information. It is generated by MoRFCHiBi using small windows of up to 40 residues in size. The second score identifies long stretches of protein disorder and is generated by ESpritz with the DisProt option. Lastly, the third score reflects residue conservation and is assembled from PSSM files generated by PSI-BLAST. These propensity scores are processed and then hierarchically combined using Bayes rule to generate the final MoRFCHiBi_Web predictions.
MoRFCHiBi_Web was tested on three datasets. Results show that MoRFCHiBi_Web outperforms previously developed predictors by generating less than half the false positive rate for the same true positive rate at practical threshold values. This level of accuracy paired with its relatively high processing speed makes MoRFCHiBi_Web a practical tool for MoRF prediction.
蛋白质的内在无序区域在各种生物过程的调控中起着至关重要的作用。其调控功能的关键通常是通过称为分子识别特征(MoRFs)的序列元件与球状蛋白结构域结合。开发用于识别氨基酸序列中候选MoRF位置的计算工具是一项重要任务,并且是一个日益受到关注的领域。鉴于蛋白质序列中MoRFs相对稀少,现有MoRF预测器的准确性在实际应用中往往不足,这留下了显著的改进需求和空间。在这项工作中,我们引入了MoRFCHiBi_Web,它在预测蛋白质序列中的MoRF位置时,与当前的MoRF预测器相比具有更高的准确性。
使用组件预测器计算三个不同且基本独立的属性得分,然后将它们组合以生成最终的MoRF倾向得分。第一个得分反映序列窗口包含MoRFs的可能性,基于氨基酸组成和序列相似性信息。它由MoRFCHiBi使用大小最多为40个残基的小窗口生成。第二个得分识别蛋白质的长无序片段,由ESpritz使用DisProt选项生成。最后,第三个得分反映残基保守性,由PSI-BLAST生成的PSSM文件组装而成。这些倾向得分经过处理,然后使用贝叶斯规则进行分层组合,以生成最终的MoRFCHiBi_Web预测。
MoRFCHiBi_Web在三个数据集上进行了测试。结果表明,在实际阈值下,对于相同的真阳性率,MoRFCHiBi_Web产生的假阳性率不到先前开发的预测器的一半,从而优于它们。这种准确性水平与其相对较高的处理速度相结合,使MoRFCHiBi_Web成为一种实用的MoRF预测工具。