Morffy Nicholas, Van den Broeck Lisa, Miller Caelan, Emenecker Ryan J, Bryant John A, Lee Tyler M, Sageman-Furnas Katelyn, Wilkinson Edward G, Pathak Sunita, Kotha Sanjana R, Lam Angelica, Mahatma Saloni, Pande Vikram, Waoo Aman, Wright R Clay, Holehouse Alex S, Staller Max V, Sozzani Rosangela, Strader Lucia C
Department of Biology, Duke University, Durham, NC, USA.
Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA.
Nature. 2024 Aug;632(8023):166-173. doi: 10.1038/s41586-024-07707-3. Epub 2024 Jul 17.
Gene expression in Arabidopsis is regulated by more than 1,900 transcription factors (TFs), which have been identified genome-wide by the presence of well-conserved DNA-binding domains. Activator TFs contain activation domains (ADs) that recruit coactivator complexes; however, for nearly all Arabidopsis TFs, we lack knowledge about the presence, location and transcriptional strength of their ADs. To address this gap, here we use a yeast library approach to experimentally identify Arabidopsis ADs on a proteome-wide scale, and find that more than half of the Arabidopsis TFs contain an AD. We annotate 1,553 ADs, the vast majority of which are, to our knowledge, previously unknown. Using the dataset generated, we develop a neural network to accurately predict ADs and to identify sequence features that are necessary to recruit coactivator complexes. We uncover six distinct combinations of sequence features that result in activation activity, providing a framework to interrogate the subfunctionalization of ADs. Furthermore, we identify ADs in the ancient AUXIN RESPONSE FACTOR family of TFs, revealing that AD positioning is conserved in distinct clades. Our findings provide a deep resource for understanding transcriptional activation, a framework for examining function in intrinsically disordered regions and a predictive model of ADs.
拟南芥中的基因表达受1900多个转录因子(TFs)调控,这些转录因子已通过全基因组范围内存在的高度保守的DNA结合结构域得以鉴定。激活型转录因子含有招募共激活因子复合物的激活结构域(ADs);然而,对于几乎所有的拟南芥转录因子,我们对其激活结构域的存在、位置和转录强度并不了解。为了填补这一空白,我们在此采用酵母文库方法,在全蛋白质组范围内通过实验鉴定拟南芥的激活结构域,发现超过一半的拟南芥转录因子含有一个激活结构域。我们注释了1553个激活结构域,据我们所知,其中绝大多数此前并不为人所知。利用生成的数据集,我们开发了一个神经网络,以准确预测激活结构域并识别招募共激活因子复合物所需的序列特征。我们发现了导致激活活性的六种不同序列特征组合,为探究激活结构域的亚功能化提供了一个框架。此外,我们在古老的生长素响应因子转录因子家族中鉴定出激活结构域,揭示了激活结构域的定位在不同进化枝中是保守的。我们的研究结果为理解转录激活提供了丰富资源,为研究内在无序区域的功能提供了一个框架,并建立了激活结构域的预测模型。