Huang Licai, Little Paul, Huyghe Jeroen R, Shi Qian, Harrison Tabitha A, Yothers Greg, George Thomas J, Peters Ulrike, Chan Andrew T, Newcomb Polly A, Sun Wei
Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA.
Department of Health Sciences Research, Mayo Clinic, Rochester, MN.
Stat Biosci. 2021 Dec;13(3):373-385. doi: 10.1007/s12561-020-09293-0. Epub 2021 Sep 15.
Gene expression data are often collected from tissue samples that are composed of multiple cell types. Studies of cell type composition based on gene expression data from tissue samples have recently attracted increasing research interest and led to new method development for cell type composition estimation. This new information on cell type composition can be associated with individual characteristics (e.g., genetic variants) or clinical outcomes (e.g., survival time). Such association analysis can be conducted for each cell type separately followed by multiple testing correction. An alternative approach is to evaluate this association using the composition of all the cell types, thus aggregating association signals across cell types. A key challenge of this approach is to account for the dependence across cell types. We propose a new method to quantify the distances between cell types while accounting for their dependencies, and use this information for association analysis. We demonstrate our method in two applied examples: to assess the association between immune cell type composition in tumor samples of colorectal cancer patients versus survival time and SNP genotypes. We found immune cell composition has prognostic value, and our distance metric leads to more accurate survival time prediction than other distance metrics that ignore cell type dependencies. In addition, survival time-associated SNPs are enriched among the SNPs associated with immune cell composition.
基因表达数据通常是从由多种细胞类型组成的组织样本中收集的。基于组织样本基因表达数据的细胞类型组成研究最近引起了越来越多的研究兴趣,并催生了用于细胞类型组成估计的新方法。关于细胞类型组成的这一新信息可以与个体特征(例如,基因变异)或临床结果(例如,生存时间)相关联。这种关联分析可以针对每种细胞类型分别进行,然后进行多重检验校正。另一种方法是使用所有细胞类型的组成来评估这种关联,从而汇总跨细胞类型的关联信号。这种方法的一个关键挑战是考虑细胞类型之间的依赖性。我们提出了一种新方法,在考虑细胞类型依赖性的同时量化细胞类型之间的距离,并将此信息用于关联分析。我们在两个应用实例中展示了我们的方法:评估结直肠癌患者肿瘤样本中的免疫细胞类型组成与生存时间和单核苷酸多态性(SNP)基因型之间的关联。我们发现免疫细胞组成具有预后价值,并且我们的距离度量比其他忽略细胞类型依赖性的距离度量能更准确地预测生存时间。此外,与生存时间相关的SNP在与免疫细胞组成相关的SNP中富集。