Bard Jonathan E, Nowak Norma J, Buck Michael J, Sinha Satrajit
Department of Biochemistry, Jacobs School of Medicine and Biomedical Sciences, State University of New York at Buffalo, Buffalo, NY, United States.
Genomics and Bioinformatics Core, Jacobs School of Medicine and Biomedical Sciences, State University of New York at Buffalo, Buffalo, NY, United States.
Front Oncol. 2022 Jul 13;12:892207. doi: 10.3389/fonc.2022.892207. eCollection 2022.
Traditional analysis of genomic data from bulk sequencing experiments seek to group and compare sample cohorts into biologically meaningful groups. To accomplish this task, large scale databases of patient-derived samples, like that of TCGA, have been established, giving the ability to interrogate multiple data modalities per tumor. We have developed a computational strategy employing multimodal integration paired with spectral clustering and modern dimension reduction techniques such as PHATE to provide a more robust method for cancer sub-type classification. Using this integrated approach, we have examined 514 Head and Neck Squamous Carcinoma (HNSC) tumor samples from TCGA across gene-expression, DNA-methylation, and microbiome data modalities. We show that these approaches, primarily developed for single-cell sequencing can be efficiently applied to bulk tumor sequencing data. Our multimodal analysis captures the dynamic heterogeneity, identifies new and refines subtypes of HNSC, and orders tumor samples along well-defined cellular trajectories. Collectively, these results showcase the inherent molecular complexity of tumors and offer insights into carcinogenesis and importance of targeted therapy. Computational techniques as highlighted in our study provide an organic and powerful approach to identify granular patterns in large and noisy datasets that may otherwise be overlooked.
对来自批量测序实验的基因组数据进行传统分析,旨在将样本队列分组并比较,从而划分出具有生物学意义的组。为完成此项任务,已建立了大型患者来源样本数据库,如TCGA数据库,这使得针对每个肿瘤能够查询多种数据模式。我们开发了一种计算策略,采用多模态整合并结合光谱聚类以及诸如PHATE等现代降维技术,以提供一种更强大的癌症亚型分类方法。利用这种综合方法,我们研究了来自TCGA的514例头颈部鳞状细胞癌(HNSC)肿瘤样本的基因表达、DNA甲基化和微生物组数据模式。我们表明,这些主要为单细胞测序开发的方法可以有效地应用于批量肿瘤测序数据。我们的多模态分析捕捉到了动态异质性,识别并细化了HNSC的新亚型,还沿着明确的细胞轨迹对肿瘤样本进行了排序。总体而言,这些结果展示了肿瘤固有的分子复杂性,并为致癌作用和靶向治疗的重要性提供了见解。我们研究中强调的计算技术提供了一种有机且强大的方法,用于识别大型嘈杂数据集中可能被忽视的精细模式。