Suppr超能文献

扩展深度可识别模型能够对单细胞生物学状态进行零样本表征。

Scaling deep identifiable models enables zero-shot characterization of single-cell biological states.

作者信息

Dong Mingze, Agrawal Kriti, Fan Rong, Sefik Esen, Flavell Richard A, Kluger Yuval

机构信息

Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA.

Department of Pathology, Yale School of Medicine, New Haven, CT, USA.

出版信息

bioRxiv. 2024 Dec 18:2023.11.11.566161. doi: 10.1101/2023.11.11.566161.

Abstract

How to identify true biological differences across samples while overcoming batch effects has been a persistent challenge in single-cell RNA-seq data analysis, hindering analyses across datasets for transferable biological findings. In this work, we show that scaling up deep identifiable models leads to a surprisingly effective solution for this challenging task. We developed scShift, a deep variational inference framework with theoretical support in disentangling batch-dependent and independent variations. By training the model with compendiums of scRNA-seq atlases, scShift shows remarkable capabilities in revealing representations of cell types and biological states in single-cell data while overcoming batch effects. We employed scShift to systematically compare lung fibrosis states across different datasets, tissues and experimental systems. scShift uniquely extrapolates lung fibrosis states to previously unseen post-COVID-19 fibrosis, characterizing universal myeloid-fibrosis signatures, potential repurposing drug targets and fibrosis-associated cell interactions. Evaluations of over 200 trained scShift models demonstrate emergent zero-shot capabilities and a scaling law beyond a transition threshold, with respect to dataset diversity. With its scaling performance on massive single-cell compendiums and exceptional zero-shot capabilities, scShift represents an important advance toward next-generation computational models for single-cell analysis.

摘要

在单细胞RNA测序数据分析中,如何在克服批次效应的同时识别样本间真正的生物学差异,一直是一个长期存在的挑战,阻碍了跨数据集的分析以获得可转移的生物学发现。在这项工作中,我们表明扩大深度可识别模型规模能为这项具有挑战性的任务带来出人意料的有效解决方案。我们开发了scShift,这是一个具有理论支持的深度变分推理框架,用于解开批次相关和独立的变异。通过使用scRNA测序图谱的纲要训练模型,scShift在克服批次效应的同时,在揭示单细胞数据中细胞类型和生物学状态的表征方面展现出卓越能力。我们使用scShift系统地比较了不同数据集、组织和实验系统中的肺纤维化状态。scShift独特地将肺纤维化状态外推到之前未见过的新冠后纤维化,确定了普遍的髓系纤维化特征、潜在的可重新利用的药物靶点以及与纤维化相关的细胞相互作用。对200多个经过训练的scShift模型的评估表明,相对于数据集多样性,出现了零样本能力和超越转变阈值的缩放定律。凭借其在大规模单细胞纲要上的缩放性能和出色的零样本能力,scShift代表了单细胞分析下一代计算模型的一项重要进展。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验