Qiu Ping, Chen Qianqian, Qin Hua, Fang Shuangsang, Zhang Yilin, Zhang Yanlin, Xia Tianyi, Cao Lei, Zhang Yong, Fang Xiaodong, Li Yuxiang, Hu Luni
College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
BGI Research, Beijing 102601, China.
Patterns (N Y). 2025 Jul 30;6(8):101326. doi: 10.1016/j.patter.2025.101326. eCollection 2025 Aug 8.
The application and evaluation of single-cell foundation models (scFMs) present significant challenges due to heterogeneous architectures and coding standards. To address this, we introduce BioLLM (biological large language model), a unified framework for integrating and applying scFMs to single-cell RNA sequencing analysis. BioLLM provides a unified interface that integrates diverse scFMs, eliminating architectural and coding inconsistencies to enable streamlined model access. With standardized APIs and comprehensive documentation, BioLLM supports streamlined model switching and consistent benchmarking. Our comprehensive evaluation of scFMs revealed distinct strengths and limitations, highlighting scGPT's robust performance across all tasks, including zero shot and fine-tuning. Geneformer and scFoundation demonstrated strong capabilities in gene-level tasks, benefiting from effective pretraining strategies. In contrast, scBERT lagged behind, likely due to its smaller model size and limited training data. Ultimately, BioLLM aims to empower the scientific community to leverage the full potential of foundational models, advancing our understanding of complex biological systems through enhanced single-cell analysis.
由于架构和编码标准的异质性,单细胞基础模型(scFMs)的应用和评估面临重大挑战。为解决这一问题,我们引入了BioLLM(生物大语言模型),这是一个用于将scFMs集成并应用于单细胞RNA测序分析的统一框架。BioLLM提供了一个统一接口,可集成各种scFMs,消除架构和编码不一致问题,以实现简化的模型访问。借助标准化的应用程序编程接口(APIs)和全面的文档,BioLLM支持简化的模型切换和一致的基准测试。我们对scFMs的全面评估揭示了其不同的优势和局限性,突出了scGPT在所有任务(包括零样本和微调)中的强大性能。Geneformer和scFoundation在基因级任务中表现出强大能力,这得益于有效的预训练策略。相比之下,scBERT落后了,可能是由于其模型规模较小和训练数据有限。最终,BioLLM旨在使科学界能够充分利用基础模型的潜力,通过增强单细胞分析来推进我们对复杂生物系统的理解。