Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA.
Center for Computational Biology, University of California, Berkeley, CA 94720, USA.
Genetics. 2023 Oct 4;225(2). doi: 10.1093/genetics/iyad131.
Transcription factors activate gene expression in development, homeostasis, and stress with DNA binding domains and activation domains. Although there exist excellent computational models for predicting DNA binding domains from protein sequence, models for predicting activation domains from protein sequence have lagged, particularly in metazoans. We recently developed a simple and accurate predictor of acidic activation domains on human transcription factors. Here, we show how the accuracy of this human predictor arises from the clustering of aromatic, leucine, and acidic residues, which together are necessary for acidic activation domain function. When we combine our predictor with the predictions of convolutional neural network (CNN) models trained in yeast, the intersection is more accurate than individual models, emphasizing that each approach carries orthogonal information. We synthesize these findings into a new set of activation domain predictions on human transcription factors.
转录因子通过 DNA 结合域和激活域在发育、稳态和应激中激活基因表达。虽然已经存在用于从蛋白质序列预测 DNA 结合域的优秀计算模型,但用于从蛋白质序列预测激活域的模型却落后了,尤其是在后生动物中。我们最近开发了一种简单而准确的人类转录因子酸性激活域预测器。在这里,我们展示了这种人类预测器的准确性是如何源于芳香族、亮氨酸和酸性残基的聚类,这些残基共同构成了酸性激活域功能所必需的。当我们将我们的预测器与在酵母中训练的卷积神经网络 (CNN) 模型的预测相结合时,交集比单个模型更准确,这强调了每种方法都携带了正交信息。我们将这些发现综合到一组新的人类转录因子激活域预测中。