Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.
Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark.
Genome Biol. 2023 Nov 16;24(1):263. doi: 10.1186/s13059-023-03104-7.
Differential analysis of bulk RNA-seq data often suffers from lack of good controls. Here, we present a generative model that replaces controls, trained solely on healthy tissues. The unsupervised model learns a low-dimensional representation and can identify the closest normal representation for a given disease sample. This enables control-free, single-sample differential expression analysis. In breast cancer, we demonstrate how our approach selects marker genes and outperforms a state-of-the-art method. Furthermore, significant genes identified by the model are enriched in driver genes across cancers. Our results show that the in silico closest normal provides a more favorable comparison than control samples.
批量 RNA-seq 数据的差异分析通常受到缺乏良好对照的困扰。在这里,我们提出了一种生成模型,该模型仅在健康组织上进行训练,从而替代对照。无监督模型学习低维表示,并且可以为给定的疾病样本识别最接近的正常表示。这使得无需对照即可进行单个样本差异表达分析。在乳腺癌中,我们展示了我们的方法如何选择标记基因,并优于最先进的方法。此外,该模型鉴定的显著基因在多种癌症中富集了驱动基因。我们的结果表明,与对照样本相比,基于计算的最近的正常样本提供了更有利的比较。