Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA.
Bioinformatics. 2018 Sep 1;34(17):i603-i611. doi: 10.1093/bioinformatics/bty563.
The rapid progress of gene expression profiling has facilitated the prosperity of recent biological studies in various fields, where gene expression data characterizes various cell conditions and regulatory mechanisms under different experimental circumstances. Despite the widespread application of gene expression profiling and advances in high-throughput technologies, profiling in genome-wide level is still expensive and difficult. Previous studies found that high correlation exists in the expression pattern of different genes, such that a small subset of genes can be informative to approximately describe the entire transcriptome. In the Library of Integrated Network-based Cell-Signature program, a set of ∼1000 landmark genes have been identified that contain ∼80% information of the whole genome and can be used to predict the expression of remaining genes. For a cost-effective profiling strategy, traditional methods measure the profiles of landmark genes and then infer the expression of other target genes via linear models. However, linear models do not have the capacity to capture the non-linear associations in gene regulatory networks.
As a flexible model with high representative power, deep learning models provide an alternate to interpret the complex relation among genes. In this paper, we propose a deep learning architecture for the inference of target gene expression profiles. We construct a novel conditional generative adversarial network by incorporating both the adversarial and ℓ1-norm loss terms in our model. Unlike the smooth and blurry predictions resulted by mean squared error objective, the coupled adversarial and ℓ1-norm loss function leads to more accurate and sharp predictions. We validate our method under two different settings and find consistent and significant improvements over all the comparing methods.
基因表达谱的快速发展促进了各个领域的生物研究的繁荣,基因表达数据描述了不同实验条件下各种细胞状态和调控机制。尽管基因表达谱的应用广泛且高通量技术不断进步,但在全基因组水平上进行分析仍然昂贵且困难。先前的研究发现,不同基因的表达模式之间存在高度相关性,以至于一小部分基因就可以提供有关整个转录组的信息。在基于网络的细胞特征综合库计划中,已经确定了一组约 1000 个标志性基因,它们包含约 80%的基因组信息,可以用于预测其他目标基因的表达。对于一种具有成本效益的分析策略,传统方法测量标志性基因的表达谱,然后通过线性模型推断其他目标基因的表达。然而,线性模型不具备捕捉基因调控网络中非线性关联的能力。
深度学习模型作为一种具有高代表性能力的灵活模型,为解释基因之间的复杂关系提供了另一种选择。在本文中,我们提出了一种用于推断目标基因表达谱的深度学习架构。我们通过在模型中同时纳入对抗和 ℓ1-范数损失项,构建了一种新颖的条件生成对抗网络。与均方误差目标产生的平滑和模糊预测不同,耦合的对抗和 ℓ1-范数损失函数导致更准确和尖锐的预测。我们在两种不同的设置下验证了我们的方法,并发现与所有比较方法相比都有一致且显著的改进。