Guo Shuai, Liu Xiaoqian, Cheng Xuesen, Jiang Yujie, Ji Shuangxi, Liang Qingnan, Koval Andrew, Li Yumei, Owen Leah A, Kim Ivana K, Aparicio Ana, Shen John Paul, Kopetz Scott, Weinstein John N, DeAngelis Margaret M, Chen Rui, Wang Wenyi
Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Authors contributed equally.
bioRxiv. 2023 Nov 11:2023.10.10.561733. doi: 10.1101/2023.10.10.561733.
Bulk deconvolution with single-cell/nucleus RNA-seq data is critical for understanding heterogeneity in complex biological samples, yet the technological discrepancy across sequencing platforms limits deconvolution accuracy. To address this, we introduce an experimental design to match inter-platform biological signals, hence revealing the technological discrepancy, and then develop a deconvolution framework called DeMixSC using the better-matched, i.e., benchmark, data. Built upon a novel weighted nonnegative least-squares framework, DeMixSC identifies and adjusts genes with high technological discrepancy and aligns the benchmark data with large patient cohorts of matched-tissue-type for large-scale deconvolution. Our results using a benchmark dataset of healthy retinas suggest much-improved deconvolution accuracy. Further analysis of a cohort of 453 patients with age-related macular degeneration supports the broad applicability of DeMixSC. Our findings reveal the impact of technological discrepancy on deconvolution performance and underscore the importance of a well-matched dataset to resolve this challenge. The developed DeMixSC framework is generally applicable for deconvolving large cohorts of disease tissues, and potentially cancer.
利用单细胞/细胞核RNA测序数据进行批量反卷积对于理解复杂生物样本中的异质性至关重要,然而测序平台之间的技术差异限制了反卷积的准确性。为了解决这个问题,我们引入了一种实验设计来匹配跨平台的生物信号,从而揭示技术差异,然后使用匹配更好的数据(即基准数据)开发了一个名为DeMixSC的反卷积框架。基于一个新颖的加权非负最小二乘框架,DeMixSC识别并调整技术差异较大的基因,并将基准数据与匹配组织类型的大型患者队列进行比对以进行大规模反卷积。我们使用健康视网膜基准数据集的结果表明反卷积准确性有了很大提高。对453名年龄相关性黄斑变性患者队列的进一步分析支持了DeMixSC的广泛适用性。我们的研究结果揭示了技术差异对反卷积性能的影响,并强调了匹配良好的数据集对于解决这一挑战的重要性。所开发的DeMixSC框架通常适用于对大量疾病组织(可能包括癌症)队列进行反卷积。