Suppr超能文献

整合具有显著批次效应的单细胞RNA测序数据集。

Integrating single-cell RNA-seq datasets with substantial batch effects.

作者信息

Hrovatin Karin, Moinfar Amir Ali, Zappia Luke, Lapuerta Alejandro Tejada, Lengerich Ben, Kellis Manolis, Theis Fabian J

机构信息

Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany.

TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.

出版信息

bioRxiv. 2024 Feb 10:2023.11.03.565463. doi: 10.1101/2023.11.03.565463.

Abstract

Integration of single-cell RNA-sequencing (scRNA-seq) datasets has become a standard part of the analysis, with conditional variational autoencoders (cVAE) being among the most popular approaches. Increasingly, researchers are asking to map cells across challenging cases such as cross-organs, species, or organoids and primary tissue, as well as different scRNA-seq protocols, including single-cell and single-nuclei. Current computational methods struggle to harmonize datasets with such substantial differences, driven by technical or biological variation. Here, we propose to address these challenges for the popular cVAE-based approaches by introducing and comparing a series of regularization constraints. The two commonly used strategies for increasing batch correction in cVAEs, that is Kullback-Leibler divergence (KL) regularization strength tuning and adversarial learning, suffer from substantial loss of biological information. Therefore, we adapt, implement, and assess alternative regularization strategies for cVAEs and investigate how they improve batch effect removal or better preserve biological variation, enabling us to propose an optimal cVAE-based integration strategy for complex systems. We show that using a VampPrior instead of the commonly used Gaussian prior not only improves the preservation of biological variation but also unexpectedly batch correction. Moreover, we show that our implementation of cycle-consistency loss leads to significantly better biological preservation than adversarial learning implemented in the previously proposed GLUE model. Additionally, we do not recommend relying only on the KL regularization strength tuning for increasing batch correction, as it removes both biological and batch information without discriminating between the two. Based on our findings, we propose a new model that combines VampPrior and cycle-consistency loss. We show that using it for datasets with substantial batch effects improves downstream interpretation of cell states and biological conditions. To ease the use of the newly proposed model, we make it available in the scvi-tools package as an external model named sysVI. Moreover, in the future, these regularization techniques could be added to other established cVAE-based models to improve the integration of datasets with substantial batch effects.

摘要

单细胞RNA测序(scRNA-seq)数据集的整合已成为分析的标准组成部分,条件变分自编码器(cVAE)是最受欢迎的方法之一。越来越多的研究人员要求在具有挑战性的情况下对细胞进行映射,例如跨器官、跨物种或类器官与原代组织之间,以及不同的scRNA-seq协议(包括单细胞和单细胞核)。由于技术或生物学差异,当前的计算方法难以协调具有如此大差异的数据集。在这里,我们建议通过引入和比较一系列正则化约束来解决基于cVAE的流行方法所面临的这些挑战。在cVAE中增加批次校正的两种常用策略,即Kullback-Leibler散度(KL)正则化强度调整和对抗学习,会导致大量生物信息的丢失。因此,我们对cVAE的替代正则化策略进行了调整、实现和评估,并研究它们如何改善批次效应消除或更好地保留生物学差异,从而使我们能够为复杂系统提出一种基于cVAE的最佳整合策略。我们表明,使用VampPrior而不是常用的高斯先验不仅可以改善生物学差异的保留,而且还能意外地进行批次校正。此外,我们表明,我们实现的循环一致性损失比先前提出的GLUE模型中实现的对抗学习在生物学保留方面有显著更好的效果。此外,我们不建议仅依赖KL正则化强度调整来增加批次校正,因为它会在不区分两者的情况下同时去除生物信息和批次信息。基于我们的发现,我们提出了一种结合VampPrior和循环一致性损失的新模型。我们表明,将其用于具有大量批次效应的数据集可改善细胞状态和生物学条件的下游解释。为了便于使用新提出的模型,我们将其作为名为sysVI的外部模型在scvi-tools包中提供。此外,未来这些正则化技术可以添加到其他已建立的基于cVAE的模型中,以改善具有大量批次效应的数据集的整合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecab/10878644/8b4d3e473fbe/nihpp-2023.11.03.565463v2-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验