Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, Mexico.
Methods Mol Biol. 2021;2328:99-113. doi: 10.1007/978-1-0716-1534-8_6.
The cell expresses various genes in specific contexts with respect to internal and external perturbations to invoke appropriate responses. Transcription factors (TFs) orchestrate and define the expression level of genes by binding to their regulatory regions. Dysregulated expression of TFs often leads to aberrant expression changes of their target genes and is responsible for several diseases including cancers. In the last two decades, several studies experimentally identified target genes of several TFs. However, these studies are limited to a small fraction of the total TFs encoded by an organism, and only for those amenable to experimental settings. Experimental limitations lead to many computational techniques having been proposed to predict target genes of TFs. Linear modeling of gene expression is one of the most promising computational approaches, readily applicable to the thousands of expression datasets available in the public domain across diverse phenotypes. Linear models assume that the expression of a gene is the sum of expression of TFs regulating it. In this chapter, I introduce mathematical programming for the linear modeling of gene expression, which has certain advantages over the conventional statistical modeling approaches. It is fast, scalable to genome level and most importantly, allows mixed integer programming to tune the model outcome with prior knowledge on gene regulation.
细胞在内部和外部扰动的特定情况下表达各种基因,以引发适当的反应。转录因子 (TFs) 通过与它们的调节区域结合来协调和定义基因的表达水平。TFs 的表达失调常常导致其靶基因的异常表达变化,并导致包括癌症在内的几种疾病。在过去的二十年中,几项研究通过实验鉴定了几个 TFs 的靶基因。然而,这些研究仅限于生物体编码的总 TFs 的一小部分,并且仅适用于那些可用于实验设置的 TFs。实验限制导致提出了许多计算技术来预测 TFs 的靶基因。基因表达的线性建模是最有前途的计算方法之一,可广泛应用于在不同表型中公开提供的数千个表达数据集。线性模型假设一个基因的表达是调节它的 TFs 的表达的总和。在本章中,我介绍了基因表达的线性建模的数学规划,它相对于传统的统计建模方法具有某些优势。它速度快,可扩展到基因组水平,最重要的是,允许混合整数规划根据基因调控的先验知识来调整模型结果。