利用多个标记数据集对单细胞RNA和ATAC数据进行自动注释。

Leveraging multiple labeled datasets for the automated annotation of single-cell RNA and ATAC data.

作者信息

Sancho-Zamora Joseba, Kanhirodan Akash, Garrote Xabier, Rojas Juan Manuel Silva, Gevaert Olivier, Hernaez Mikel, Serrano Guillermo, Ochoa Idoia

机构信息

Tecnun School of Engineering, Universidad de Navarra, Donostia, Spain.

National Institute of Technology Calicut, India.

出版信息

Comput Struct Biotechnol J. 2025 Jul 1;27:2863-2870. doi: 10.1016/j.csbj.2025.06.043. eCollection 2025.

DOI:10.1016/j.csbj.2025.06.043

PMID:40687986

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12270792/

Abstract

The creation of single-cell atlases is essential for understanding cellular diversity and heterogeneity. However, assembling these atlases is challenging due to batch effects and the need for accurate and consistent cell annotation. Current methods for single-cell RNA and ATAC sequencing (scRNA-Seq and scATAC-Seq), while effective for integration, are not optimized for cell annotation. Additionally, many annotation tools rely on external databases or reference scRNA-Seq datasets, which may limit their adaptability to specific study needs, especially for rare cell-types or scATAC-Seq data. Here, we introduce JIND-Multi, a new framework designed to transfer cell-type labels across multiple annotated datasets. Notably, JIND-Multi can be applied to both scRNA-Seq and scATAC-Seq data, requiring in each case annotated data of the same type, contrary to most methods for scATAC-Seq data that require (paired) annotated scRNA-Seq data. In both cases, JIND-Multi significantly reduces the proportion of unclassified cells while maintaining the accuracy and performance of the original JIND model, and compares favorable to state-of-the-art methods. These results prove its versatility and effectiveness across different single-cell sequencing technologies. JIND-Multi represents an improvement in cell annotation, reducing unassigned cells and offering a reliable solution for both scRNA-Seq and scATAC-Seq data. Its ability to handle multiple labeled datasets enhances the precision of annotations, making it a valuable tool for the single-cell research community. JIND-Multi is publicly available at: https://github.com/ML4BM-Lab/JIND-Multi.git.

摘要

单细胞图谱的创建对于理解细胞多样性和异质性至关重要。然而，由于批次效应以及对准确且一致的细胞注释的需求，组装这些图谱具有挑战性。当前用于单细胞RNA和ATAC测序（scRNA-Seq和scATAC-Seq）的方法，虽然对整合有效，但并未针对细胞注释进行优化。此外，许多注释工具依赖外部数据库或参考scRNA-Seq数据集，这可能会限制它们对特定研究需求的适应性，特别是对于罕见细胞类型或scATAC-Seq数据。在此，我们引入JIND-Multi，这是一个旨在跨多个注释数据集转移细胞类型标签的新框架。值得注意的是，JIND-Multi可应用于scRNA-Seq和scATAC-Seq数据，在每种情况下都需要相同类型的注释数据，这与大多数用于scATAC-Seq数据的方法相反，后者需要（配对的）注释scRNA-Seq数据。在这两种情况下，JIND-Multi都显著降低了未分类细胞的比例，同时保持了原始JIND模型的准确性和性能，并且与最先进的方法相比具有优势。这些结果证明了它在不同单细胞测序技术中的通用性和有效性。JIND-Multi代表了细胞注释方面的一项改进，减少了未分配的细胞，并为scRNA-Seq和scATAC-Seq数据提供了可靠的解决方案。它处理多个标记数据集的能力提高了注释的精度，使其成为单细胞研究领域的一个有价值的工具。JIND-Multi可在以下网址公开获取：https://github.com/ML4BM-Lab/JIND-Multi.git。