评估用于注释 scRNA-seq 数据的预训练语言模型的参数高效方法。

Assessing parameter efficient methods for pre-trained language model in annotating scRNA-seq data.

机构信息

Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu, 610209, China.

School of Computer Science, Chengdu University of Information Technology, Chengdu, 610225, China.

出版信息

Methods. 2024 Aug;228:12-21. doi: 10.1016/j.ymeth.2024.05.007. Epub 2024 May 15.

DOI:10.1016/j.ymeth.2024.05.007

PMID:38759908

Abstract

Annotating cell types of single-cell RNA sequencing (scRNA-seq) data is crucial for studying cellular heterogeneity in the tumor microenvironment. Recently, large-scale pre-trained language models (PLMs) have achieved significant progress in cell-type annotation of scRNA-seq data. This approach effectively addresses previous methods' shortcomings in performance and generalization. However, fine-tuning PLMs for different downstream tasks demands considerable computational resources, rendering it impractical. Hence, a new research branch introduces parameter-efficient fine-tuning (PEFT). This involves optimizing a few parameters while leaving the majority unchanged, leading to substantial reductions in computational expenses. Here, we utilize scBERT, a large-scale pre-trained model, to explore the capabilities of three PEFT methods in scRNA-seq cell type annotation. Extensive benchmark studies across several datasets demonstrate the superior applicability of PEFT methods. Furthermore, downstream analysis using models obtained through PEFT showcases their utility in novel cell type discovery and model interpretability for potential marker genes. Our findings underscore the considerable potential of PEFT in PLM-based cell type annotation, presenting novel perspectives for the analysis of scRNA-seq data.

摘要

对单细胞 RNA 测序 (scRNA-seq) 数据进行细胞类型注释对于研究肿瘤微环境中的细胞异质性至关重要。最近，大规模的预训练语言模型 (PLM) 在 scRNA-seq 数据的细胞类型注释方面取得了重大进展。这种方法有效地解决了以前方法在性能和泛化方面的缺点。然而，为不同的下游任务微调 PLM 需要大量的计算资源，这使得实际应用变得不切实际。因此，一个新的研究分支引入了参数高效微调 (PEFT)。这涉及优化几个参数，而大部分参数保持不变，从而大大降低了计算成本。在这里，我们利用 scBERT，一个大规模的预训练模型，来探索三种 PEFT 方法在 scRNA-seq 细胞类型注释中的能力。在多个数据集上的广泛基准研究表明，PEFT 方法具有优越的适用性。此外，使用通过 PEFT 获得的模型进行下游分析展示了它们在新的细胞类型发现和潜在标记基因的模型可解释性方面的实用性。我们的研究结果强调了 PEFT 在基于 PLM 的细胞类型注释中的巨大潜力，为 scRNA-seq 数据的分析提供了新的视角。

相似文献

Assessing parameter efficient methods for pre-trained language model in annotating scRNA-seq data.

Methods. 2024 Aug;228:12-21. doi: 10.1016/j.ymeth.2024.05.007. Epub 2024 May 15.

Continually adapting pre-trained language model to universal annotation of single-cell RNA-seq data.

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae047.

scPLAN: a hierarchical computational framework for single transcriptomics data annotation, integration and cell-type label refinement.

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae305.

scSwinFormer: A Transformer-Based Cell-Type Annotation Method for scRNA-Seq Data Using Smooth Gene Embedding and Global Features.

J Chem Inf Model. 2024 Aug 26;64(16):6316-6323. doi: 10.1021/acs.jcim.4c00616. Epub 2024 Aug 5.

scGAA: a general gated axial-attention model for accurate cell-type annotation of single-cell RNA-seq data.

Sci Rep. 2024 Sep 27;14(1):22308. doi: 10.1038/s41598-024-73356-1.

XgCPred: Cell type classification using XGBoost-CNN integration and exploiting gene expression imaging in single-cell RNAseq data.

Comput Biol Med. 2024 Oct;181:109066. doi: 10.1016/j.compbiomed.2024.109066. Epub 2024 Aug 24.

scEM: A New Ensemble Framework for Predicting Cell Type Composition Based on scRNA-Seq Data.

Interdiscip Sci. 2024 Jun;16(2):304-317. doi: 10.1007/s12539-023-00601-y. Epub 2024 Feb 18.

TripletCell: a deep metric learning framework for accurate annotation of cell types at the single-cell level.

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad132.

Single-Cell RNA Sequencing for Studying Human Cancers.

Annu Rev Biomed Data Sci. 2023 Aug 10;6:1-22. doi: 10.1146/annurev-biodatasci-020722-091857. Epub 2023 Apr 11.

scBoolSeq: Linking scRNA-seq statistics and Boolean dynamics.

PLoS Comput Biol. 2024 Jul 8;20(7):e1011620. doi: 10.1371/journal.pcbi.1011620. eCollection 2024 Jul.

引用本文的文献

An overview of computational methods in single-cell transcriptomic cell type annotation.

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf207.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估用于注释 scRNA-seq 数据的预训练语言模型的参数高效方法。

Assessing parameter efficient methods for pre-trained language model in annotating scRNA-seq data.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献