Department of Biomedical Engineering and Center for Science & Engineering of Living Systems (CSELS), Washington University in St. Louis, MO 63130, USA.
Department of Physics, Washington University in St. Louis, MO 63130, USA.
J Mol Biol. 2022 Jan 30;434(2):167373. doi: 10.1016/j.jmb.2021.167373. Epub 2021 Dec 1.
Sequence-ensemble relationships of intrinsically disordered proteins (IDPs) are governed by binary patterns such as the linear clustering or mixing of specific residues or residue types with respect to one another. To enable the discovery of potentially important, shared patterns across sequence families, we describe a computational method referred to as NARDINI for Non-random Arrangement of Residues in Disordered Regions Inferred using Numerical Intermixing. This work was partially motivated by the observation that parameters that are currently in use for describing different binary patterns are not interoperable across IDPs of different amino acid compositions and lengths. In NARDINI, we generate an ensemble of scrambled sequences to set up a composition-specific null model for the patterning parameters of interest. We then compute a series of pattern-specific z-scores to quantify how each pattern deviates from a null model for the IDP of interest. The z-scores help in identifying putative non-random linear sequence patterns within an IDP. We demonstrate the use of NARDINI derived z-scores by identifying sequence patterns in three well-studied IDP systems. We also demonstrate how NARDINI can be deployed to study archetypal IDPs across homologs and orthologs. Overall, NARDINI is likely to aid in designing novel IDPs with a view toward engineering new sequence-function relationships or uncovering cryptic ones. We further propose that the z-scores introduced here are likely to be useful for theoretical and computational descriptions of sequence-ensemble relationships across IDPs of different compositions and lengths.
无规卷曲蛋白 (IDP) 的序列-集合关系受二元模式控制,例如特定残基或残基类型相对于彼此的线性聚类或混合。为了能够在序列家族之间发现潜在重要的、共享的模式,我们描述了一种称为 NARDINI 的计算方法,用于推断使用数值混合的无序区域中残基的非随机排列。这项工作的部分动机是观察到,目前用于描述不同二元模式的参数在不同氨基酸组成和长度的 IDP 之间不可互操作。在 NARDINI 中,我们生成一组混淆序列,为感兴趣的模式参数建立特定于组成的空模型。然后,我们计算一系列特定于模式的 z 分数,以量化每个模式相对于感兴趣的 IDP 的空模型的偏差。z 分数有助于识别 IDP 中潜在的非随机线性序列模式。我们通过在三个研究充分的 IDP 系统中识别序列模式来展示 NARDINI 衍生的 z 分数的用途。我们还展示了如何部署 NARDINI 来研究同源物和直系同源物中的典型 IDP。总体而言,NARDINI 可能有助于设计具有新序列-功能关系或揭示隐藏功能关系的新型 IDP。我们进一步提出,此处引入的 z 分数可能对不同组成和长度的 IDP 的序列-集合关系的理论和计算描述有用。