Suppr超能文献

scELMo:来自语言模型的嵌入是单细胞数据分析的优秀学习者。

scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis.

作者信息

Liu Tianyu, Chen Tianqi, Zheng Wangjie, Luo Xiao, Chen Yiqun, Zhao Hongyu

出版信息

bioRxiv. 2025 Aug 23:2023.12.07.569910. doi: 10.1101/2023.12.07.569910.

Abstract

Various Foundation Models (FMs) have been built based on the pre-training and fine-tuning framework to analyze single-cell data with different degrees of success. In this manuscript, we propose a method named scELMo (Single-cell Embedding from Language Models), to analyze single-cell data that utilizes Large Language Models (LLMs) as a generator for both the description of metadata information and the embeddings for such descriptions. We combine the embeddings from LLMs with the raw data under the zero-shot learning framework to further extend its function by using the fine-tuning framework to handle different tasks. We demonstrate that scELMo is capable of cell clustering, batch effect correction, and cell-type annotation without training a new model. Moreover, the fine-tuning framework of scELMo can help with more challenging tasks including in-silico treatment analysis or modeling perturbation. scELMo has a lighter structure and lower requirements for resources. Our method also outperforms recent large-scale FMs (such as scGPT [1], Geneformer [2]) and other LLM-based single-cell data analysis pipelines (such as GenePT [3] and GPTCelltype [4]) based on our evaluations, suggesting a promising path for developing domain-specific FMs.

摘要

基于预训练和微调框架构建了各种基础模型(FMs),用于分析单细胞数据,取得了不同程度的成功。在本论文中,我们提出了一种名为scELMo(基于语言模型的单细胞嵌入)的方法,用于分析单细胞数据,该方法利用大语言模型(LLMs)作为元数据信息描述及其嵌入的生成器。我们在零样本学习框架下将来自LLMs的嵌入与原始数据相结合,并通过使用微调框架处理不同任务来进一步扩展其功能。我们证明scELMo能够在不训练新模型的情况下进行细胞聚类、批次效应校正和细胞类型注释。此外,scELMo的微调框架有助于处理更具挑战性的任务,包括虚拟治疗分析或建模扰动。scELMo结构更轻,对资源的要求更低。基于我们的评估,我们的方法还优于最近的大规模FMs(如scGPT [1]、Geneformer [2])以及其他基于LLM的单细胞数据分析管道(如GenePT [3]和GPTCelltype [4]),为开发特定领域的FMs开辟了一条有前景的道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e439/12393277/714946e13cb9/nihpp-2023.12.07.569910v4-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验