MRC Laboratory of Molecular Biology, Cambridge, UK
Butler University, Indianapolis, IN, USA.
Mol Syst Biol. 2018 May 14;14(5):e8190. doi: 10.15252/msb.20188190.
Over 40% of proteins in any eukaryotic genome encode intrinsically disordered regions (IDRs) that do not adopt defined tertiary structures. Certain IDRs perform critical functions, but discovering them is non-trivial as the biological context determines their function. We present IDR-Screen, a framework to discover functional IDRs in a high-throughput manner by simultaneously assaying large numbers of DNA sequences that code for short disordered sequences. Functionality-conferring patterns in their protein sequence are inferred through statistical learning. Using yeast HSF1 transcription factor-based assay, we discovered IDRs that function as transactivation domains (TADs) by screening a random sequence library and a designed library consisting of variants of 13 diverse TADs. Using machine learning, we find that segments devoid of positively charged residues but with redundant short sequence patterns of negatively charged and aromatic residues are a generic feature for TAD functionality. We anticipate that investigating defined sequence libraries using IDR-Screen for specific functions can facilitate discovering novel and functional regions of the disordered proteome as well as understand the impact of natural and disease variants in disordered segments.
在任何真核基因组的蛋白质中,超过 40%的蛋白质编码无规则区域(IDR),这些区域不采用明确的三级结构。某些 IDR 发挥着关键作用,但发现它们并非易事,因为生物环境决定了它们的功能。我们提出了 IDR-Screen,这是一种通过同时检测大量编码短无序序列的 DNA 序列,以高通量方式发现功能 IDR 的框架。通过统计学习,可以推断出它们蛋白质序列中赋予功能的模式。我们使用酵母 HSF1 转录因子为基础的测定法,通过筛选随机序列文库和由 13 种不同 TAD 变体组成的设计文库,发现了作为转录激活结构域(TAD)的 IDR。通过机器学习,我们发现缺乏正电荷残基但具有冗余短负电荷和芳香族残基序列模式的片段是 TAD 功能的通用特征。我们预计,使用 IDR-Screen 针对特定功能研究定义的序列文库,可以促进发现无序蛋白质组中新颖且具有功能的区域,并了解无序片段中自然和疾病变体的影响。