Tan Shawn Zheng Kai, Puig-Barbe Aleix, Goutte-Gattat Damien, Eastwood Caroline, Aevermann Brian, Avola Alida, Balhoff James P, Bayindir Ismail Ugur, Belfiore Jasmine, Caron Anita Reane, Fischer David S, George Nancy, Gyori Benjamin M, Haendel Melissa A, Hoyt Charles Tapley, Kir Huseyin, Lubiana Tiago, Matentzoglu Nicolas, Overton James A, Peng Beverly, Peters Bjoern, Quardokus Ellen M, Ray Patrick L, Roncaglia Paola, Rivera Andrea D, Stefancsik Ray, Teh Wei Kheng, Toro Sabrina, Vasilevsky Nicole, Xu Chuan, Zhang Yun, Scheuermann Richard H, Mungall Chirstopher J, Diehl Alexander D, Osumi-Sutherland David
Scientific Data Registration, Novo Nordisk A/S, Måløv, Denmark.
European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Saffron Walden, CB10 1SD, UK.
ArXiv. 2025 Jun 17:arXiv:2506.10037v2.
Single-cell omics technologies have transformed our understanding of cellular diversity by enabling high-resolution profiling of individual cells. However, the unprecedented scale and heterogeneity of these datasets demand robust frameworks for data integration and annotation. The Cell Ontology (CL) has emerged as a pivotal resource for achieving FAIR (Findable, Accessible, Interoperable, and Reusable) data principles by providing standardized, species-agnostic terms for canonical cell types-forming a core component of a wide range of platforms and tools. In this paper, we describe the wide variety of uses of CL in these platforms and tools and detail ongoing work to improve and extend CL content including the addition of transcriptomically defined types, working closely with major atlasing efforts including the Human Cell Atlas and the Brain Initiative Cell Atlas Network to support their needs. We cover the challenges and future plans for harmonising classical and transcriptomic cell type definitions, integrating markers and using Large Language Models (LLMs) to improve content and efficiency of CL workflows.
单细胞组学技术通过对单个细胞进行高分辨率分析,改变了我们对细胞多样性的理解。然而,这些数据集前所未有的规模和异质性需要强大的数据整合和注释框架。细胞本体(CL)已成为实现FAIR(可查找、可访问、可互操作和可重用)数据原则的关键资源,它为标准细胞类型提供标准化的、与物种无关的术语,构成了广泛平台和工具的核心组成部分。在本文中,我们描述了CL在这些平台和工具中的各种用途,并详细介绍了正在进行的改进和扩展CL内容的工作,包括添加转录组定义的细胞类型,与包括人类细胞图谱和脑计划细胞图谱网络在内的主要图谱绘制工作密切合作以满足其需求。我们还探讨了协调经典和转录组细胞类型定义、整合标记以及使用大语言模型(LLMs)来提高CL工作流程的内容和效率所面临的挑战和未来计划。