Vincent Michael, Whidden Mark, Schnell Santiago
Department of Molecular & Integrative Physiology, University of Michigan Medical School, Ann Arbor, MI, USA.
Department of Molecular & Integrative Physiology, University of Michigan Medical School, Ann Arbor, MI, USA; Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, MI, USA; Brehm Center for Diabetes Research, University of Michigan Medical School, Ann Arbor, MI, USA.
Biophys Chem. 2016 Jun;213:6-16. doi: 10.1016/j.bpc.2016.03.005. Epub 2016 Apr 5.
Intrinsically disordered proteins fail to adopt a stable three-dimensional structure under physiological conditions. It is now understood that many disordered proteins are not dysfunctional, but instead engage in numerous cellular processes, including signaling and regulation. Disorder characterization from amino acid sequence relies on computational disorder prediction algorithms. While numerous large-scale investigations of disorder have been performed using these algorithms, and have offered valuable insight regarding the prevalence of protein disorder in many organisms, critical proteome-based descriptive statistical guidelines that would enable the objective assessment of intrinsic disorder in a protein of interest remain to be established. Here we present a quantitative characterization of numerous disorder features using a rigorous non-parametric statistical approach, providing expected values and percentile cutoffs for each feature in ten eukaryotic proteomes. Our estimates utilize multiple ab initio disorder prediction algorithms grounded on physicochemical principles. Furthermore, we present novel threshold values, specific to both the prediction algorithms and the proteomes, defining the longest primary sequence length in which the significance of a continuous disordered region can be evaluated on the basis of length alone. The guidelines presented here are intended to improve the interpretation of disorder content and continuous disorder predictions from the proteomic point of view.
内在无序蛋白质在生理条件下无法形成稳定的三维结构。现在人们认识到,许多无序蛋白质并非功能失调,而是参与了众多细胞过程,包括信号传导和调控。基于氨基酸序列的无序特征表征依赖于计算无序预测算法。虽然已经使用这些算法对无序进行了大量大规模研究,并提供了关于许多生物体中蛋白质无序普遍性的宝贵见解,但能够对感兴趣蛋白质的内在无序进行客观评估的基于蛋白质组的关键描述性统计指南仍有待建立。在这里,我们使用严格的非参数统计方法对众多无序特征进行了定量表征,为十个真核生物蛋白质组中的每个特征提供了期望值和百分位数截止值。我们的估计利用了基于物理化学原理的多种从头算无序预测算法。此外,我们提出了特定于预测算法和蛋白质组的新阈值,定义了可以仅根据长度评估连续无序区域显著性的最长一级序列长度。本文提出的指南旨在从蛋白质组学角度改进对无序含量和连续无序预测的解释。