Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.
Stanford Center for Biomedical Informatics Research, Department of Medicine and Biomedical Data Science, Stanford University, Stanford, CA 94305, USA.
Cell Rep Methods. 2023 Jan 16;3(1):100392. doi: 10.1016/j.crmeth.2022.100392. eCollection 2023 Jan 23.
Despite the abundance of multimodal data, suitable statistical models that can improve our understanding of diseases with genetic underpinnings are challenging to develop. Here, we present SparseGMM, a statistical approach for gene regulatory network discovery. SparseGMM uses latent variable modeling with sparsity constraints to learn Gaussian mixtures from multiomic data. By combining coexpression patterns with a Bayesian framework, SparseGMM quantitatively measures confidence in regulators and uncertainty in target gene assignment by computing gene entropy. We apply SparseGMM to liver cancer and normal liver tissue data and evaluate discovered gene modules in an independent single-cell RNA sequencing (scRNA-seq) dataset. SparseGMM identifies PROCR as a regulator of angiogenesis and PDCD1LG2 and HNF4A as regulators of immune response and blood coagulation in cancer. Furthermore, we show that more genes have significantly higher entropy in cancer compared with normal liver. Among high-entropy genes are key multifunctional components shared by critical pathways, including p53 and estrogen signaling.
尽管多模态数据丰富,但开发能够提高我们对遗传基础疾病理解的合适统计模型仍然具有挑战性。在这里,我们提出了 SparseGMM,这是一种用于基因调控网络发现的统计方法。SparseGMM 使用具有稀疏约束的潜在变量建模,从多组学数据中学习高斯混合模型。通过将共表达模式与贝叶斯框架相结合,SparseGMM 通过计算基因熵来定量衡量对调节剂的置信度和对靶基因分配的不确定性。我们将 SparseGMM 应用于肝癌和正常肝组织数据,并在独立的单细胞 RNA 测序(scRNA-seq)数据集评估发现的基因模块。SparseGMM 确定 PROCR 为血管生成的调节剂,PDCD1LG2 和 HNF4A 为癌症中免疫反应和血液凝固的调节剂。此外,我们表明,与正常肝脏相比,癌症中更多的基因具有显著更高的熵。高熵基因中包括关键途径共有的多功能组件,包括 p53 和雌激素信号转导。