Hou Wenpin, Ji Zhicheng
Department of Biostatistics, The Mailman School of Public Health, Columbia University, New York City, NY, USA.
Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA.
Res Sq. 2023 May 2:rs.3.rs-2824971. doi: 10.21203/rs.3.rs-2824971/v1.
Cell type annotation is an essential step in single-cell RNA-seq analysis. However, it is a time-consuming process that often requires expertise in collecting canonical marker genes and manually annotating cell types. Automated cell type annotation methods typically require the acquisition of high-quality reference datasets and the development of additional pipelines. We demonstrate that GPT-4, a highly potent large language model, can automatically and accurately annotate cell types by utilizing marker gene information generated from standard single-cell RNA-seq analysis pipelines. Evaluated across hundreds of tissue types and cell types, GPT-4 generates cell type annotations exhibiting strong concordance with manual annotations, and has the potential to considerably reduce the effort and expertise needed in cell type annotation.
细胞类型注释是单细胞RNA测序分析中的一个重要步骤。然而,这是一个耗时的过程,通常需要收集标准标记基因和手动注释细胞类型方面的专业知识。自动化细胞类型注释方法通常需要获取高质量的参考数据集并开发额外的流程。我们证明,GPT-4,一种功能强大的大语言模型,可以通过利用标准单细胞RNA测序分析流程生成的标记基因信息自动且准确地注释细胞类型。在数百种组织类型和细胞类型上进行评估,GPT-4生成的细胞类型注释与手动注释具有很强的一致性,并且有潜力大幅减少细胞类型注释所需的工作量和专业知识。