a Centre of Advanced Study in Crystallography & Biophysics , University of Madras , Guindy Campus, Chennai 600 025 , Tamilnadu , India.
b Centre for Computational Biology and Bioinformatics , Indiana University School of Medicine , Indianapolis , IN , USA.
J Biomol Struct Dyn. 2018 Dec;36(16):4338-4351. doi: 10.1080/07391102.2017.1415822. Epub 2017 Dec 27.
More than 60 prediction methods for intrinsically disordered proteins (IDPs) have been developed over the years, many of which are accessible on the World Wide Web. Nearly, all of these predictors give balanced accuracies in the 65%-80% range. Since predictors are not perfect, further studies are required to uncover the role of amino acid residues in native IDP as compared to predicted IDP regions. In the present work, we make use of sequences of 100% predicted IDP regions, false positive disorder predictions, and experimentally determined IDP regions to distinguish the characteristics of native versus predicted IDP regions. A higher occurrence of asparagine is observed in sequences of native IDP regions but not in sequences of false positive predictions of IDP regions. The occurrences of certain combinations of amino acids at the pentapeptide level provide a distinguishing feature in the IDPs with respect to globular proteins. The distinguishing features presented in this paper provide insights into the sequence fingerprints of amino acid residues in experimentally determined as compared to predicted IDP regions. These observations and additional work along these lines should enable the development of improvements in the accuracy of disorder prediction algorithm.
多年来,已经开发出了 60 多种用于预测无序蛋白质(IDP)的方法,其中许多方法都可以在万维网上获得。几乎所有这些预测器的准确率都在 65%-80%之间。由于预测器并不完美,因此需要进一步的研究来揭示与预测的 IDP 区域相比,氨基酸残基在天然 IDP 中的作用。在本工作中,我们利用 100%预测的 IDP 区域、假阳性无序预测和实验确定的 IDP 区域的序列来区分天然 IDP 区域与预测 IDP 区域的特征。在天然 IDP 区域的序列中观察到天冬酰胺的出现频率更高,但在 IDP 区域的假阳性预测序列中则没有。在五肽水平上某些氨基酸组合的出现为 IDP 与球状蛋白提供了一个区分特征。本文提出的区分特征提供了关于实验确定的 IDP 区域与预测的 IDP 区域中氨基酸残基的序列指纹的深入了解。这些观察结果和沿着这些思路的进一步工作,应该能够提高无序预测算法的准确性。