Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA.
Department of Biostatistics, University of Washington, Seattle, Washington, USA.
Biometrics. 2023 Sep;79(3):2705-2718. doi: 10.1111/biom.13769. Epub 2022 Oct 17.
Somatic mutations in cancer patients are inherently sparse and potentially high dimensional. Cancer patients may share the same set of deregulated biological processes perturbed by different sets of somatically mutated genes. Therefore, when assessing the associations between somatic mutations and clinical outcomes, gene-by-gene analysis is often under-powered because it does not capture the complex disease mechanisms shared across cancer patients. Rather than testing genes one by one, an intuitive approach is to aggregate somatic mutation data of multiple genes to assess their joint association with clinical outcomes. The challenge is how to aggregate such information. Building on the optimal transport method, we propose a principled approach to estimate the similarity of somatic mutation profiles of multiple genes between tumor samples, while accounting for gene-gene similarities defined by gene annotations or empirical mutational patterns. Using such similarities, we can assess the associations between somatic mutations and clinical outcomes by kernel regression. We have applied our method to analyze somatic mutation data of 17 cancer types and identified at least five cancer types, where somatic mutations are associated with overall survival, progression-free interval, or cytolytic activity.
癌症患者的体细胞突变本质上是稀疏的,并且具有潜在的高维度。癌症患者可能具有相同的一组被失调的生物过程,这些过程受到不同的体细胞突变基因集的干扰。因此,在评估体细胞突变与临床结局之间的关联时,逐基因分析通常效力不足,因为它没有捕获跨癌症患者共享的复杂疾病机制。与其逐个测试基因,一种直观的方法是聚合多个基因的体细胞突变数据,以评估它们与临床结局的联合关联。挑战在于如何聚合此类信息。基于最优传输方法,我们提出了一种基于原理的方法来估计肿瘤样本中多个基因的体细胞突变谱之间的相似性,同时考虑了由基因注释或经验突变模式定义的基因-基因相似性。使用这种相似性,我们可以通过核回归来评估体细胞突变与临床结局之间的关联。我们已经应用我们的方法分析了 17 种癌症类型的体细胞突变数据,并确定了至少五种癌症类型,其中体细胞突变与总生存期、无进展间隔或细胞溶解活性相关。