Gonzalez-Ferrer Jesus, Lehrer Julian, O'Farrell Ash, Paten Benedict, Teodorescu Mircea, Haussler David, Jonsson Vanessa D, Mostajo-Radji Mohammed A
Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA.
Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Applied Mathematics, University of California, Santa Cruz, Santa Cruz, CA 95060, USA.
Cell Genom. 2024 Jun 12;4(6):100581. doi: 10.1016/j.xgen.2024.100581. Epub 2024 May 31.
Cell atlases serve as vital references for automating cell labeling in new samples, yet existing classification algorithms struggle with accuracy. Here we introduce SIMS (scalable, interpretable machine learning for single cell), a low-code data-efficient pipeline for single-cell RNA classification. We benchmark SIMS against datasets from different tissues and species. We demonstrate SIMS's efficacy in classifying cells in the brain, achieving high accuracy even with small training sets (<3,500 cells) and across different samples. SIMS accurately predicts neuronal subtypes in the developing brain, shedding light on genetic changes during neuronal differentiation and postmitotic fate refinement. Finally, we apply SIMS to single-cell RNA datasets of cortical organoids to predict cell identities and uncover genetic variations between cell lines. SIMS identifies cell-line differences and misannotated cell lineages in human cortical organoids derived from different pluripotent stem cell lines. Altogether, we show that SIMS is a versatile and robust tool for cell-type classification from single-cell datasets.
细胞图谱是新样本中细胞自动标记的重要参考,但现有的分类算法在准确性方面存在困难。在此,我们引入了SIMS(用于单细胞的可扩展、可解释机器学习),这是一种用于单细胞RNA分类的低代码、数据高效的流程。我们将SIMS与来自不同组织和物种的数据集进行基准测试。我们展示了SIMS在大脑细胞分类中的有效性,即使在小训练集(<3500个细胞)和不同样本中也能实现高精度。SIMS准确预测发育中大脑的神经元亚型,揭示神经元分化和有丝分裂后命运细化过程中的基因变化。最后,我们将SIMS应用于皮质类器官的单细胞RNA数据集,以预测细胞身份并揭示细胞系之间的基因变异。SIMS识别了源自不同多能干细胞系的人类皮质类器官中的细胞系差异和错误注释的细胞谱系。总之,我们表明SIMS是一种用于从单细胞数据集中进行细胞类型分类的通用且强大的工具。