Ivich Adriana, Greene Casey S
Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
bioRxiv. 2025 Aug 23:2025.08.20.671333. doi: 10.1101/2025.08.20.671333.
Bulk RNA-seq deconvolution typically uses single-cell RNA-sequencing (scRNA-seq) references, but some cell types are only detectable through single-nucleus RNA sequencing (snRNA-seq). Because snRNA-seq captures nuclear, but not cytoplasmic, transcripts, direct use as a reference could reduce deconvolution accuracy. Here, we systematically benchmark strategies to integrate both modalities, focusing on transformations and gene-filtering approaches that harmonize snRNA-seq with scRNA-seq references. Across four diverse tissues, we evaluated principal component-based shifts, conditional and non-conditional variational autoencoders (scVI), and the removal of cross-modality differentially expressed genes (DEGs). While all methods improved performance relative to untransformed snRNA-seq, filtering consistent cross-modality DEGs delivered the greatest gains, often matching or surpassing scRNA-only references. Conditional scVI performed comparably and was especially effective when matched scRNA-snRNA cell types were unavailable. In real adipose bulk samples without ground truth, DEG pruning and conditional scVI provided the most robust cell-fraction estimates across donors and transformations. Together, these results demonstrate that scRNA-seq should be prioritized as the reference when available, with snRNA-seq appended only after filtering cross-modality DEGs. For less-characterized systems where DEG information is limited, conditional scVI offers a practical alternative. Our findings provide clear guidelines for modality-aware integration, enabling near-scRNA-seq accuracy in bulk deconvolution workflows.
批量RNA测序反卷积通常使用单细胞RNA测序(scRNA-seq)参考数据,但某些细胞类型只能通过单细胞核RNA测序(snRNA-seq)检测到。由于snRNA-seq捕获的是细胞核转录本而非细胞质转录本,直接用作参考可能会降低反卷积的准确性。在此,我们系统地对整合这两种模式的策略进行了基准测试,重点关注使snRNA-seq与scRNA-seq参考数据相协调的转换和基因过滤方法。在四种不同的组织中,我们评估了基于主成分的偏移、条件和非条件变分自编码器(scVI),以及去除跨模式差异表达基因(DEG)的方法。虽然所有方法相对于未转换的snRNA-seq都提高了性能,但过滤一致的跨模式DEG带来的提升最大,通常能与仅使用scRNA的参考数据相匹配或超过它。条件scVI的表现相当,在没有匹配的scRNA-snRNA细胞类型时尤其有效。在没有真实情况的实际脂肪批量样本中,DEG修剪和条件scVI在不同供体和转换中提供了最稳健的细胞分数估计。总之,这些结果表明,在有可用的scRNA-seq时应优先将其作为参考,仅在过滤跨模式DEG后再附加snRNA-seq。对于DEG信息有限的特征较少的系统,条件scVI提供了一种实用的替代方法。我们的研究结果为模式感知整合提供了明确的指导方针,在批量反卷积工作流程中实现了接近scRNA-seq的准确性。