Department of Preventive Medicine, Northwestern University, Chicago, Illinois.
Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio.
Stat Med. 2023 Dec 10;42(28):5266-5284. doi: 10.1002/sim.9911. Epub 2023 Sep 15.
In recent years, comprehensive cancer genomics platforms, such as The Cancer Genome Atlas (TCGA), provide access to an enormous amount of high throughput genomic datasets for each patient, including gene expression, DNA copy number alterations, DNA methylation, and somatic mutation. While the integration of these multi-omics datasets has the potential to provide novel insights that can lead to personalized medicine, most existing approaches only focus on gene-level analysis and lack the ability to facilitate biological findings at the pathway-level. In this article, we propose Bayes-InGRiD (Bayesian Integrative Genomics Robust iDentification of cancer subgroups), a novel pathway-guided Bayesian sparse latent factor model for the simultaneous identification of cancer patient subgroups (clustering) and key molecular features (variable selection) within a unified framework, based on the joint analysis of continuous, binary, and count data. By utilizing pathway (gene set) information, Bayes-InGRiD does not only enhance the accuracy and robustness of cancer patient subgroup and key molecular feature identification, but also promotes biological understanding and interpretation. Finally, to facilitate an efficient posterior sampling, an alternative Gibbs sampler for logistic and negative binomial models is proposed using Pólya-Gamma mixtures of normal to represent latent variables for binary and count data, which yields a conditionally Gaussian representation of the posterior. The R package "INGRID" implementing the proposed approach is currently available in our research group GitHub webpage (https://dongjunchung.github.io/INGRID/).
近年来,综合癌症基因组学平台,如癌症基因组图谱(TCGA),为每个患者提供了大量高通量基因组数据集的访问权限,包括基因表达、DNA 拷贝数改变、DNA 甲基化和体细胞突变。虽然整合这些多组学数据集有可能提供新的见解,从而导致个性化医疗,但大多数现有方法仅关注基因水平的分析,并且缺乏在途径水平促进生物学发现的能力。在本文中,我们提出了 Bayes-InGRiD(基于贝叶斯综合基因组稳健识别癌症亚群的方法),这是一种新的途径指导的贝叶斯稀疏潜在因子模型,用于在统一框架内同时识别癌症患者亚群(聚类)和关键分子特征(变量选择),基于对连续、二进制和计数数据的联合分析。通过利用途径(基因集)信息,Bayes-InGRiD 不仅提高了癌症患者亚群和关键分子特征识别的准确性和稳健性,而且促进了生物学理解和解释。最后,为了促进有效的后验抽样,我们提出了一种用于逻辑和负二项式模型的替代吉布斯抽样器,使用 Pólya-Gamma 正态混合来表示二进制和计数数据的潜在变量,从而得到后验的条件高斯表示。目前,我们的研究小组在 GitHub 网页(https://dongjunchung.github.io/INGRID/)上提供了实现该方法的 R 包“INGRID”。