Interdisciplinary Biological Sciences, Northwestern University, Evanston, IL 60208, USA.
Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60208, USA.
Bioinformatics. 2019 Nov 1;35(22):4671-4678. doi: 10.1093/bioinformatics/btz256.
To understand the regulatory pathways underlying diseases, studies often investigate the differential gene expression between genetically or chemically differing cell populations. Differential expression analysis identifies global changes in transcription and enables the inference of functional roles of applied perturbations. This approach has transformed the discovery of genetic drivers of disease and possible therapies. However, differential expression analysis does not provide quantitative predictions of gene expression in untested conditions. We present a hybrid approach, termed Differential Expression in Python (DiffExPy), that uniquely combines discrete, differential expression analysis with in silico differential equation simulations to yield accurate, quantitative predictions of gene expression from time-series data.
To demonstrate the distinct insight provided by DiffExpy, we applied it to published, in vitro, time-series RNA-seq data from several genetic PI3K/PTEN variants of MCF10a cells stimulated with epidermal growth factor. DiffExPy proposed ensembles of several minimal differential equation systems for each differentially expressed gene. These systems provide quantitative models of expression for several previously uncharacterized genes and uncover new regulation by the PI3K/PTEN pathways. We validated model predictions on expression data from conditions that were not used for model training. Our discrete, differential expression analysis also identified SUZ12 and FOXA1 as possible regulators of specific groups of genes that exhibit late changes in expression. Our work reveals how DiffExPy generates quantitatively predictive models with testable, biological hypotheses from time-series expression data.
DiffExPy is available on GitHub (https://github.com/bagherilab/diffexpy).
Supplementary data are available at Bioinformatics online.
为了了解疾病相关的调控途径,研究人员通常会研究遗传或化学差异细胞群体之间的差异基因表达。差异表达分析确定了转录的全局变化,并能够推断应用扰动的功能作用。这种方法改变了对疾病遗传驱动因素和可能疗法的发现。然而,差异表达分析不能提供未测试条件下基因表达的定量预测。我们提出了一种混合方法,称为 Python 中的差异表达分析(DiffExPy),它独特地将离散的差异表达分析与计算机上的微分方程模拟相结合,从时间序列数据中得出基因表达的准确、定量预测。
为了展示 DiffExPy 提供的独特见解,我们将其应用于发表的 MCF10a 细胞体外时间序列 RNA-seq 数据,这些细胞受到表皮生长因子的刺激,具有几种不同的 PI3K/PTEN 变体。DiffExPy 为每个差异表达基因提出了几个最小微分方程系统的集合。这些系统为几个以前未表征的基因提供了表达的定量模型,并揭示了 PI3K/PTEN 途径的新调控。我们在未用于模型训练的条件下的表达数据上验证了模型预测。我们的离散差异表达分析还确定了 SUZ12 和 FOXA1 作为可能调节具有晚期表达变化的特定基因群的调节剂。我们的工作揭示了 DiffExPy 如何从时间序列表达数据生成具有可测试的生物学假设的定量预测模型。
DiffExPy 可在 GitHub 上获得(https://github.com/bagherilab/diffexpy)。
补充数据可在 Bioinformatics 在线获得。