Suppr超能文献

scTab:缩放跨组织单细胞注释模型。

scTab: Scaling cross-tissue single-cell annotation models.

机构信息

Department of Computational Health, Institute of Computational Biology, Helmholtz, Munich, Germany.

School of Computing, Information and Technology, Technical University of Munich, Munich, Germany.

出版信息

Nat Commun. 2024 Aug 4;15(1):6611. doi: 10.1038/s41467-024-51059-5.

Abstract

Identifying cellular identities is a key use case in single-cell transcriptomics. While machine learning has been leveraged to automate cell annotation predictions for some time, there has been little progress in scaling neural networks to large data sets and in constructing models that generalize well across diverse tissues. Here, we propose scTab, an automated cell type prediction model specific to tabular data, and train it using a novel data augmentation scheme across a large corpus of single-cell RNA-seq observations (22.2 million cells). In this context, we show that cross-tissue annotation requires nonlinear models and that the performance of scTab scales both in terms of training dataset size and model size. Additionally, we show that the proposed data augmentation schema improves model generalization. In summary, we introduce a de novo cell type prediction model for single-cell RNA-seq data that can be trained across a large-scale collection of curated datasets and demonstrate the benefits of using deep learning methods in this paradigm.

摘要

鉴定细胞身份是单细胞转录组学的一个关键应用。虽然机器学习已经被用于自动化细胞注释预测一段时间了,但在将神经网络扩展到大数据集和构建能够很好地跨多种组织概括的模型方面,几乎没有取得什么进展。在这里,我们提出了 scTab,这是一种专门针对表格数据的自动化细胞类型预测模型,并使用一种新的数据增强方案在一个包含大量单细胞 RNA-seq 观测值(2220 万细胞)的语料库中对其进行训练。在这种情况下,我们表明跨组织注释需要非线性模型,并且 scTab 的性能在训练数据集大小和模型大小方面都有所扩展。此外,我们表明所提出的数据增强方案提高了模型的泛化能力。总之,我们为单细胞 RNA-seq 数据引入了一种新的细胞类型预测模型,该模型可以在大规模的经过整理的数据集集合上进行训练,并展示了在这种范例中使用深度学习方法的好处。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3770/11298532/b5560b2f128e/41467_2024_51059_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验