Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu, 610209, China.
School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China.
Methods. 2024 Aug;228:12-21. doi: 10.1016/j.ymeth.2024.05.007. Epub 2024 May 15.
Annotating cell types of single-cell RNA sequencing (scRNA-seq) data is crucial for studying cellular heterogeneity in the tumor microenvironment. Recently, large-scale pre-trained language models (PLMs) have achieved significant progress in cell-type annotation of scRNA-seq data. This approach effectively addresses previous methods' shortcomings in performance and generalization. However, fine-tuning PLMs for different downstream tasks demands considerable computational resources, rendering it impractical. Hence, a new research branch introduces parameter-efficient fine-tuning (PEFT). This involves optimizing a few parameters while leaving the majority unchanged, leading to substantial reductions in computational expenses. Here, we utilize scBERT, a large-scale pre-trained model, to explore the capabilities of three PEFT methods in scRNA-seq cell type annotation. Extensive benchmark studies across several datasets demonstrate the superior applicability of PEFT methods. Furthermore, downstream analysis using models obtained through PEFT showcases their utility in novel cell type discovery and model interpretability for potential marker genes. Our findings underscore the considerable potential of PEFT in PLM-based cell type annotation, presenting novel perspectives for the analysis of scRNA-seq data.
对单细胞 RNA 测序 (scRNA-seq) 数据进行细胞类型注释对于研究肿瘤微环境中的细胞异质性至关重要。最近,大规模的预训练语言模型 (PLM) 在 scRNA-seq 数据的细胞类型注释方面取得了重大进展。这种方法有效地解决了以前方法在性能和泛化方面的缺点。然而,为不同的下游任务微调 PLM 需要大量的计算资源,这使得实际应用变得不切实际。因此,一个新的研究分支引入了参数高效微调 (PEFT)。这涉及优化几个参数,而大部分参数保持不变,从而大大降低了计算成本。在这里,我们利用 scBERT,一个大规模的预训练模型,来探索三种 PEFT 方法在 scRNA-seq 细胞类型注释中的能力。在多个数据集上的广泛基准研究表明,PEFT 方法具有优越的适用性。此外,使用通过 PEFT 获得的模型进行下游分析展示了它们在新的细胞类型发现和潜在标记基因的模型可解释性方面的实用性。我们的研究结果强调了 PEFT 在基于 PLM 的细胞类型注释中的巨大潜力,为 scRNA-seq 数据的分析提供了新的视角。