Xue Ran, Zakharov Mikhail N, Xia Yu, Bhasin Shalender, Costello James C, Jasuja Ravi
Research Program in Men's Health: Aging and Metabolism (R.X., S.B., J.C.C., R.J.), Boston Claude D. Pepper Older Americans Independence Center, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts 02215; The National Library of Medicine (M.N.Z.), National Center for Bioinformation Technology, The National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland 20892; and Department of Bioengineering (Y.X.), Faculty of Engineering, McGill University, Montreal, Quebec H3A 0C3, Canada.
Mol Endocrinol. 2014 May;28(5):768-77. doi: 10.1210/me.2014-1006. Epub 2014 Mar 28.
Nuclear receptors (NRs) are a superfamily of transcription factors central to regulating many biological processes, including cell growth, death, metabolism, and immune responses. NR-mediated gene expression can be modulated by coactivators and corepressors through direct physical interaction or protein complexes with functional domains in NRs. One class of these domains includes short linear motifs (SLiMs), which facilitate protein-protein interactions, phosphorylation, and ligand binding primarily in the intrinsically disordered regions (IDRs) of proteins. Across all proteins, the number of known SLiMs is limited due to the difficulty in studying IDRs experimentally. Computational tools provide a systematic and data-driven approach for predicting functional motifs that can be used to prioritize experimental efforts. Accordingly, several tools have been developed based on sequence conservation or biophysical features; however, discrepancies in predictions make it difficult to determine the true candidate SLiMs. In this work, we present the ensemble predictor for short linear motifs (EPSLiM), a novel strategy to prioritize the residues that are most likely to be SLiMs in IDRs. EPSLiM applies a generalized linear model to integrate predictions from individual methodologies. We show that EPSLiM outperforms individual predictors, and we apply our method to NRs. The androgen receptor is an example with an N-terminal domain of 559 disordered amino acids that contains several validated SLiMs important for transcriptional activation. We use the androgen receptor to illustrate the predictive performance of EPSLiM and make the results of all human and mouse NRs publically available through the web service http://epslim.bwh.harvard.edu.
核受体(NRs)是一类转录因子超家族,在调节许多生物学过程中起着核心作用,包括细胞生长、死亡、代谢和免疫反应。NR介导的基因表达可通过共激活因子和共抑制因子,通过与NRs中功能域的直接物理相互作用或蛋白质复合物进行调节。这类结构域中的一类包括短线性基序(SLiMs),其主要在蛋白质的内在无序区域(IDRs)中促进蛋白质-蛋白质相互作用、磷酸化和配体结合。在所有蛋白质中,由于实验研究IDRs存在困难,已知SLiMs的数量有限。计算工具为预测功能基序提供了一种系统的数据驱动方法,可用于确定实验研究的优先级。因此,已经基于序列保守性或生物物理特征开发了几种工具;然而,预测结果的差异使得难以确定真正的候选SLiMs。在这项工作中,我们提出了短线性基序集成预测器(EPSLiM),这是一种对IDRs中最有可能是SLiMs的残基进行优先级排序的新策略。EPSLiM应用广义线性模型来整合来自各个方法的预测。我们表明EPSLiM优于单个预测器,并将我们的方法应用于NRs。雄激素受体就是一个例子,其N端结构域有559个无序氨基酸,包含几个对转录激活很重要的已验证SLiMs。我们用雄激素受体来说明EPSLiM的预测性能,并通过网络服务http://epslim.bwh.harvard.edu将所有人类和小鼠NRs的结果公开。