Department of Nutrition, Case Western Reserve University, Cleveland, OH, USA.
Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA.
BMC Bioinformatics. 2021 Apr 19;22(1):200. doi: 10.1186/s12859-021-04126-3.
Transcriptional regulation is complex, requiring multiple cis (local) and trans acting mechanisms working in concert to drive gene expression, with disruption of these processes linked to multiple diseases. Previous computational attempts to understand the influence of regulatory mechanisms on gene expression have used prediction models containing input features derived from cis regulatory factors. However, local chromatin looping and trans-acting mechanisms are known to also influence transcriptional regulation, and their inclusion may improve model accuracy and interpretation. In this study, we create a general model of transcription factor influence on gene expression by incorporating both cis and trans gene regulatory features.
We describe a computational framework to model gene expression for GM12878 and K562 cell lines. This framework weights the impact of transcription factor-based regulatory data using multi-omics gene regulatory networks to account for both cis and trans acting mechanisms, and measures of the local chromatin context. These prediction models perform significantly better compared to models containing cis-regulatory features alone. Models that additionally integrate long distance chromatin interactions (or chromatin looping) between distal transcription factor binding regions and gene promoters also show improved accuracy. As a demonstration of their utility, effect estimates from these models were used to weight cis-regulatory rare variants for sequence kernel association test analyses of gene expression.
Our models generate refined effect estimates for the influence of individual transcription factors on gene expression, allowing characterization of their roles across the genome. This work also provides a framework for integrating multiple data types into a single model of transcriptional regulation.
转录调控很复杂,需要多个顺式(局部)和反式作用机制协同工作来驱动基因表达,这些过程的破坏与多种疾病有关。以前,为了理解调控机制对基因表达的影响,计算方法学曾尝试使用包含来自顺式调控因子的输入特征的预测模型。然而,局部染色质环和反式作用机制也已知会影响转录调控,将它们纳入模型可能会提高模型的准确性和可解释性。在这项研究中,我们通过纳入顺式和反式基因调控特征,创建了一个通用的转录因子对基因表达影响的模型。
我们描述了一个用于 GM12878 和 K562 细胞系的基因表达的计算框架。该框架使用多组学基因调控网络来对基于转录因子的调控数据进行加权,以解释顺式和反式作用机制以及局部染色质环境的影响。与仅包含顺式调控特征的模型相比,这些预测模型的性能显著提高。另外整合了远距离染色质相互作用(或染色质环)的模型,即在远端转录因子结合区域和基因启动子之间的相互作用,也显示出了更高的准确性。作为其实用性的一个演示,这些模型的效应估计值被用于加权顺式调控罕见变异,以进行基因表达的序列核关联测试分析。
我们的模型生成了个体转录因子对基因表达影响的精细效应估计值,从而可以在整个基因组范围内对其作用进行描述。这项工作还为将多种数据类型整合到单个转录调控模型中提供了一个框架。