SwarmMAP：用于单细胞测序数据中分散细胞类型注释的群体学习

SwarmMAP: Swarm Learning for Decentralized Cell Type Annotation in Single Cell Sequencing Data.

作者信息

Saldanha Oliver Lester, Goepp Vivien, Pfeiffer Kevin, Kim Hyojin, Zhu Jie Fu, Kramann Rafael, Hayat Sikander, Kather Jakob Nikolas

机构信息

Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Fetscherstraße 74, Dresden, 01307, Saxony, Germany.

Department of Medicine 2, RWTH Aachen University, Medical Faculty, Pauwelsstrasse 30, Aachen, 52074, North Rhine-Westphalia, Germany.

出版信息

bioRxiv. 2025 Jan 16:2025.01.13.632775. doi: 10.1101/2025.01.13.632775.

DOI:10.1101/2025.01.13.632775

PMID:39868099

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11761033/

Abstract

Rapid technological advancements have made it possible to generate single-cell data at a large scale. Several laboratories around the world can now generate single-cell transcriptomic data from different tissues. Unsupervised clustering, followed by annotation of the cell type of the identified clusters, is a crucial step in single-cell analyses. However, there is no consensus on the marker genes to use for annotation, and cell-type annotation is currently mostly done by manual inspection of marker genes, which is irreproducible, and poorly scalable. Additionally, patient-privacy is also a critical issue with human datasets. There is a critical need to standardize and automate cell-type annotation across datasets in a privacy-preserving manner. Here, we developed SwarmMAP that uses Swarm Learning to train machine learning models for cell-type classification based on single-cell sequencing data in a decentralized way. SwarmMAP does not require any exchange of raw data between data centers. SwarmMAP has a F1-score of 0.93, 0.98, and 0.88 for cell type classification in human heart, lung, and breast datasets, respectively. Swarm Learning-based models yield an average performance of which is on par with the performance achieved by models trained on centralized data (-val=, Mann-Whitney Test). We also find that increasing the number of datasets increases cell-type prediction accuracy and enables handling higher cell-type diversity. Together, these findings demonstrate that Swarm Learning is a viable approach to automate cell-type annotation. SwarmMAP is available at https://github.com/hayatlab/SwarmMAP.

摘要

快速的技术进步使得大规模生成单细胞数据成为可能。现在，世界各地的几个实验室都能够从不同组织中生成单细胞转录组数据。无监督聚类，随后对所识别聚类的细胞类型进行注释，是单细胞分析中的关键步骤。然而，对于用于注释的标记基因尚无共识，目前细胞类型注释大多通过人工检查标记基因来完成，这是不可重复的，并且扩展性很差。此外，患者隐私也是人类数据集的一个关键问题。迫切需要以保护隐私的方式对跨数据集的细胞类型注释进行标准化和自动化。在这里，我们开发了SwarmMAP，它使用群体学习以分散的方式基于单细胞测序数据训练用于细胞类型分类的机器学习模型。SwarmMAP不需要数据中心之间交换任何原始数据。在人类心脏、肺和乳腺数据集中，SwarmMAP进行细胞类型分类的F1分数分别为0.93、0.98和0.88。基于群体学习的模型产生的平均性能与在集中数据上训练的模型所达到的性能相当（-val =，曼-惠特尼检验）。我们还发现，增加数据集的数量可以提高细胞类型预测准确性，并能够处理更高的细胞类型多样性。总之，这些发现表明群体学习是一种可行的自动化细胞类型注释的方法。SwarmMAP可在https://github.com/hayatlab/SwarmMAP上获取。