Department of Chemistry and Interdisciplinary Nanoscience Center, University of Aarhus Aarhus, Denmark.
Front Mol Biosci. 2016 Feb 11;3:4. doi: 10.3389/fmolb.2016.00004. eCollection 2016.
The protein universe consists of a continuum of structures ranging from full order to complete disorder. As the structured part of the proteome has been intensively studied, stably folded proteins are increasingly well documented and understood. However, proteins that are fully, or in large part, disordered are much less well characterized. Here we collected NMR chemical shifts in a small database for 117 protein sequences that are known to contain disorder. We demonstrate that NMR chemical shift data can be brought to bear as an exquisite judge of protein disorder at the residue level, and help in validation. With the help of secondary chemical shift analysis we demonstrate that the proteins in the database span the full spectrum of disorder, but still, largely segregate into two classes; disordered with small segments of order scattered along the sequence, and structured with small segments of disorder inserted between the different structured regions. A detailed analysis reveals that the distribution of order/disorder along the sequence shows a complex and asymmetric distribution, that is highly protein-dependent. Access to ratified training data further suggests an avenue to improving prediction of disorder from sequence.
蛋白质宇宙由一系列结构组成,范围从完全有序到完全无序。随着对蛋白质组结构部分的深入研究,稳定折叠的蛋白质越来越被充分记录和理解。然而,完全或大部分无序的蛋白质的特征要差得多。在这里,我们为已知含有无序结构的 117 个蛋白质序列收集了一个小型数据库中的 NMR 化学位移数据。我们证明,NMR 化学位移数据可以作为残基水平上蛋白质无序的精确判断标准,并有助于验证。通过对二级化学位移分析的帮助,我们证明数据库中的蛋白质涵盖了无序的全部范围,但仍然主要分为两类;无序的,有序的小片段散布在序列中,结构的,无序的小片段插入在不同的结构区域之间。详细分析表明,序列中有序/无序的分布呈现复杂且不对称的分布,这高度依赖于蛋白质。访问经过验证的训练数据进一步为从序列预测无序提供了一种途径。