Zhang Xingyuan, Ji Zhicheng
Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA.
Computational Biology and Bioinformatics Program, Duke University School of Medicine, Durham, NC, USA.
Res Sq. 2025 Aug 12:rs.3.rs-7151095. doi: 10.21203/rs.3.rs-7151095/v1.
A major challenge in integrating previously analyzed single-cell RNA-seq studies is the inconsistency of cell type annotations. To address this, we developed GCTHarmony, an LLM-based method for harmonizing cell type annotations across single-cell studies. Utilizing OpenAI's text embedding model, GCTHarmony accurately maps arbitrary cell type annotations to standardized cell ontology terms and reconciles discrepancies in annotation hierarchies across studies. In a real data example, we show that GCTHarmony substantially improves the consistency of cell type annotations across single-cell studies.
整合先前分析的单细胞RNA测序研究的一个主要挑战是细胞类型注释的不一致性。为了解决这个问题,我们开发了GCTHarmony,这是一种基于大语言模型的方法,用于协调单细胞研究中的细胞类型注释。利用OpenAI的文本嵌入模型,GCTHarmony可以将任意细胞类型注释准确地映射到标准化的细胞本体术语,并协调各研究中注释层次结构的差异。在一个实际数据示例中,我们表明GCTHarmony显著提高了单细胞研究中细胞类型注释的一致性。