Suppr超能文献

AnnoGCD:用于自动细胞类型注释的通用类别发现框架。

AnnoGCD: a generalized category discovery framework for automatic cell type annotation.

作者信息

Ceccarelli Francesco, Liò Pietro, Holden Sean B

机构信息

Department of Computer Science and Technology, University of Cambridge, 15 JJ Thomson Ave, CB3 0FD, Cambridge, UK.

出版信息

NAR Genom Bioinform. 2024 Dec 4;6(4):lqae166. doi: 10.1093/nargab/lqae166. eCollection 2024 Dec.

Abstract

The identification of cell types in single-cell RNA sequencing (scRNA-seq) data is a critical task in understanding complex biological systems. Traditional supervised machine learning methods rely on large, well-labeled datasets, which are often impractical to obtain in open-world scenarios due to budget constraints and incomplete information. To address these challenges, we propose a novel computational framework, named AnnoGCD, building on Generalized Category Discovery (GCD) and Anomaly Detection (AD) for automatic cell type annotation. Our semi-supervised method combines labeled and unlabeled data to accurately classify known cell types and to discover novel ones, even in imbalanced datasets. AnnoGCD includes a semi-supervised block to first classify known cell types, followed by an unsupervised block aimed at identifying and clustering novel cell types. We evaluated our approach on five human scRNA-seq datasets and a mouse model atlas, demonstrating superior performance in both known and novel cell type identification compared to existing methods. Our model also exhibited robustness in datasets with significant class imbalance. The results suggest that AnnoGCD is a powerful tool for the automatic annotation of cell types in scRNA-seq data, providing a scalable solution for biological research and clinical applications. Our code and the datasets used for evaluations are publicly available on GitHub: https://github.com/cecca46/AnnoGCD/.

摘要

在单细胞RNA测序(scRNA-seq)数据中识别细胞类型是理解复杂生物系统的一项关键任务。传统的监督式机器学习方法依赖于大型的、标注良好的数据集,由于预算限制和信息不完整,在开放场景中往往难以获得。为应对这些挑战,我们基于广义类别发现(GCD)和异常检测(AD)提出了一种名为AnnoGCD的新型计算框架,用于自动细胞类型注释。我们的半监督方法结合了有标签和无标签的数据,即使在不平衡数据集中也能准确分类已知细胞类型并发现新的细胞类型。AnnoGCD包括一个半监督模块,首先对已知细胞类型进行分类,随后是一个无监督模块,旨在识别和聚类新的细胞类型。我们在五个人类scRNA-seq数据集和一个小鼠模型图谱上评估了我们的方法,与现有方法相比,在已知和新细胞类型识别方面均表现出卓越的性能。我们的模型在具有显著类别不平衡的数据集中也表现出鲁棒性。结果表明,AnnoGCD是scRNA-seq数据中细胞类型自动注释的强大工具,为生物学研究和临床应用提供了可扩展的解决方案。我们用于评估的代码和数据集可在GitHub上公开获取:https://github.com/cecca46/AnnoGCD/

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1180/11629990/e4c061ed0429/lqae166fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验