Guo Shuai, Liu Xiaoqian, Cheng Xuesen, Jiang Yujie, Ji Shuangxi, Liang Qingnan, Koval Andrew, Li Yumei, Owen Leah A, Kim Ivana K, Aparicio Ana, Lee Sanghoon, Sood Anil K, Kopetz Scott, Shen John Paul, Weinstein John N, DeAngelis Margaret M, Chen Rui, Wang Wenyi
Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA.
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.
Genome Res. 2025 Jan 22;35(1):147-161. doi: 10.1101/gr.278822.123.
Bulk deconvolution with single-cell/nucleus RNA-seq data is critical for understanding heterogeneity in complex biological samples, yet the technological discrepancy across sequencing platforms limits deconvolution accuracy. To address this, we utilize an experimental design to match inter-platform biological signals, hence revealing the technological discrepancy, and then develop a deconvolution framework called DeMixSC using this well-matched, that is, benchmark, data. Built upon a novel weighted nonnegative least-squares framework, DeMixSC identifies and adjusts genes with high technological discrepancy and aligns the benchmark data with large patient cohorts of matched-tissue-type for large-scale deconvolution. Our results using two benchmark data sets of healthy retinas and ovarian cancer tissues suggest much-improved deconvolution accuracy. Leveraging tissue-specific benchmark data sets, we applied DeMixSC to a large cohort of 453 age-related macular degeneration patients and a cohort of 30 ovarian cancer patients with various responses to neoadjuvant chemotherapy. Only DeMixSC successfully unveiled biologically meaningful differences across patient groups, demonstrating its broad applicability in diverse real-world clinical scenarios. Our findings reveal the impact of technological discrepancy on deconvolution performance and underscore the importance of a well-matched data set to resolve this challenge. The developed DeMixSC framework is generally applicable for accurately deconvolving large cohorts of disease tissues, including cancers, when a well-matched benchmark data set is available.
利用单细胞/细胞核RNA测序数据进行批量反卷积对于理解复杂生物样本中的异质性至关重要,然而测序平台之间的技术差异限制了反卷积的准确性。为了解决这个问题,我们采用一种实验设计来匹配平台间的生物信号,从而揭示技术差异,然后使用这种匹配良好(即基准)的数据开发了一个名为DeMixSC的反卷积框架。基于一个新颖的加权非负最小二乘框架,DeMixSC识别并调整技术差异大的基因,并将基准数据与匹配组织类型的大型患者队列对齐以进行大规模反卷积。我们使用健康视网膜和卵巢癌组织的两个基准数据集的结果表明反卷积准确性有了很大提高。利用组织特异性基准数据集,我们将DeMixSC应用于453名年龄相关性黄斑变性患者的大型队列以及30名对新辅助化疗有不同反应的卵巢癌患者队列。只有DeMixSC成功揭示了不同患者组之间具有生物学意义的差异,证明了其在各种实际临床场景中的广泛适用性。我们的研究结果揭示了技术差异对反卷积性能的影响,并强调了匹配良好的数据集对于解决这一挑战的重要性。当有匹配良好的基准数据集时,所开发的DeMixSC框架通常适用于准确反卷积包括癌症在内的大量疾病组织队列。