文献检索，用中文搜 PubMed

Single-cell large language models (scLLMs) capture essential biological insights from vast single-cell atlases but struggle in out-of-context applications, where zero-shot predictions can be unreliable. To address this, we introduce a single-cell parameter-efficient fine-tuning (scPEFT) framework that integrates learnable, low-dimensional adapters into scLLMs. By freezing the backbone model and updating only the adapter parameters, scPEFT efficiently adapts to specific tasks using limited custom data. This approach mitigates catastrophic forgetting, reduces parameter tuning by over 96%, and decreases GPU memory usage by more than half, significantly enhancing scLLMs's accessibility for resource-constrained researchers. Validated across diverse datasets, scPEFT outperformed zero-shot models and traditional fine-tuning in disease-specific, cross-species, and under-characterized cell population tasks. Its attention-mechanism analysis identified COVID-related genes associated with specific cell states and uncovered unique blood cell subpopulations, demonstrating scPEFT's capacity for condition-specific interpretations. These findings position scPEFT as an efficient solution for improving scLLMs' utilities in general single-cell analyses.

Harnessing the Power of Single-Cell Large Language Models with Parameter Efficient Fine-Tuning using scPEFT.

作者信息

He Fei, Fei Ruixin, Krull Jordan E, Zhang Xinyu, Gao Mingyue, Su Li, Chen Yibo, Yu Yang, Li Jinpu, Jin Baichuan, Chang Yuzhou, Ma Anjun, Ma Qin, Xu Dong

机构信息

Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA.

Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA.

出版信息

Res Sq. 2025 Apr 25:rs.3.rs-5926885. doi: 10.21203/rs.3.rs-5926885/v1.

单细胞大语言模型（scLLMs）能从海量的单细胞图谱中获取重要的生物学见解，但在脱离上下文的应用中表现不佳，在这类应用中零样本预测可能不可靠。为了解决这一问题，我们引入了一种单细胞参数高效微调（scPEFT）框架，该框架将可学习的低维适配器集成到scLLMs中。通过冻结主干模型并仅更新适配器参数，scPEFT使用有限的自定义数据有效地适应特定任务。这种方法减轻了灾难性遗忘，将参数调整减少了96%以上，并将GPU内存使用量减少了一半以上，显著提高了资源受限的研究人员使用scLLMs的便利性。在各种数据集上经过验证，scPEFT在疾病特异性、跨物种和特征不明确的细胞群体任务中优于零样本模型和传统微调。其注意力机制分析确定了与特定细胞状态相关的COVID相关基因，并发现了独特的血细胞亚群，证明了scPEFT进行特定条件解释的能力。这些发现使scPEFT成为在一般单细胞分析中提高scLLMs实用性的有效解决方案。