Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.
Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA.
Bioinformatics. 2021 Jun 9;37(9):1234-1245. doi: 10.1093/bioinformatics/btaa947.
The activity of a transcription factor (TF) in a sample of cells is the extent to which it is exerting its regulatory potential. Many methods of inferring TF activity from gene expression data have been described, but due to the lack of appropriate large-scale datasets, systematic and objective validation has not been possible until now.
We systematically evaluate and optimize the approach to TF activity inference in which a gene expression matrix is factored into a condition-independent matrix of control strengths and a condition-dependent matrix of TF activity levels. We find that expression data in which the activities of individual TFs have been perturbed are both necessary and sufficient for obtaining good performance. To a considerable extent, control strengths inferred using expression data from one growth condition carry over to other conditions, so the control strength matrices derived here can be used by others. Finally, we apply these methods to gain insight into the upstream factors that regulate the activities of yeast TFs Gcr2, Gln3, Gcn4 and Msn2.
Evaluation code and data are available at https://doi.org/10.5281/zenodo.4050573.
Supplementary data are available at Bioinformatics online.
转录因子(TF)在细胞样本中的活性是其发挥调节潜能的程度。已经描述了许多从基因表达数据推断 TF 活性的方法,但由于缺乏适当的大规模数据集,直到现在才有可能进行系统和客观的验证。
我们系统地评估和优化了一种从基因表达矩阵推断 TF 活性的方法,该方法将基因表达矩阵分解为条件独立的控制强度矩阵和条件依赖的 TF 活性水平矩阵。我们发现,对单个 TF 活性进行扰动的表达数据对于获得良好的性能既是必要的,也是充分的。在相当大的程度上,使用一种生长条件下的表达数据推断出的控制强度可以推广到其他条件,因此这里得出的控制强度矩阵可以被其他人使用。最后,我们应用这些方法来深入了解调节酵母 TF Gcr2、Gln3、Gcn4 和 Msn2 活性的上游因素。
评估代码和数据可在 https://doi.org/10.5281/zenodo.4050573 获得。
补充数据可在《生物信息学》在线获得。