利用深度学习对人类265种细胞类型进行单细胞类型注释。

Single-cell type annotation with deep learning in 265 cell types for humans.

作者信息

Dong Sherry, Deng Kaiwen, Huang Xiuzhen

机构信息

Skyline High School, Ann Arbor, MI 48103, United States.

National AI Campus and Department of Computational Biomedicine, Cedars-Sinai Medical Center, West Hollywood, CA 90069, United States.

出版信息

Bioinform Adv. 2024 Apr 8;4(1):vbae054. doi: 10.1093/bioadv/vbae054. eCollection 2024.

DOI:10.1093/bioadv/vbae054

PMID:38645719

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11031354/

Abstract

MOTIVATION

Annotating cell types is a challenging yet essential task in analyzing single-cell RNA sequencing data. However, due to the lack of a gold standard, it is difficult to evaluate the algorithms fairly and an overfitting algorithm may be favored in benchmarks. To address this challenge, we developed a deep learning-based single-cell type prediction tool that assigns the cell type to 265 different cell types for humans, based on data from approximately five million cells.

RESULTS

We achieved a median area under the ROC curve (AUC) of 0.93 when evaluated across datasets. We found that inconsistent labeling in the existing database generated by different labs contributed to the mistakes of the model. Therefore, we used cell ontology to correct the annotations and retrained the model, which resulted in 0.971 median AUC. Our study reveals a limiting factor of the accuracy one may achieve with the current database annotation and points to the solutions towards an algorithm-based correction of the gold standard for future automated cell annotation approaches.

AVAILABILITY AND IMPLEMENTATION

The code is available at: https://github.com/SherrySDong/Hierarchical-Correction-Improves-Automated-Single-cell-Type-Annotation. Data used in this study are listed in Supplementary Table S1 and are retrievable at the CZI database.

摘要

动机

在分析单细胞RNA测序数据时，注释细胞类型是一项具有挑战性但又至关重要的任务。然而，由于缺乏金标准，很难公平地评估算法，并且在基准测试中可能会青睐过拟合算法。为应对这一挑战，我们开发了一种基于深度学习的单细胞类型预测工具，该工具基于约五百万个细胞的数据，将细胞类型分配给人类的265种不同细胞类型。

结果

在跨数据集评估时，我们实现了ROC曲线下面积（AUC）中位数为0.93。我们发现不同实验室生成的现有数据库中的标签不一致导致了模型的错误。因此，我们使用细胞本体来校正注释并重新训练模型，这使得AUC中位数达到0.971。我们的研究揭示了当前数据库注释可能达到的准确性的一个限制因素，并指出了未来自动细胞注释方法基于算法校正金标准的解决方案。

可用性和实现

代码可在以下网址获取：https://github.com/SherrySDong/Hierarchical-Correction-Improves-Automated-Single-cell-Type-Annotation。本研究中使用的数据列于补充表S1中，可在CZI数据库中检索。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c930/11031354/e5db7f569ac6/vbae054f1.jpg

相似文献

Single-cell type annotation with deep learning in 265 cell types for humans.

Bioinform Adv. 2024 Apr 8;4(1):vbae054. doi: 10.1093/bioadv/vbae054. eCollection 2024.

MACA: marker-based automatic cell-type annotation for single-cell expression data.

Bioinformatics. 2022 Mar 4;38(6):1756-1760. doi: 10.1093/bioinformatics/btab840.

scMRA: a robust deep learning method to annotate scRNA-seq data with multiple reference datasets.

Bioinformatics. 2022 Jan 12;38(3):738-745. doi: 10.1093/bioinformatics/btab700.

CALLR: a semi-supervised cell-type annotation method for single-cell RNA sequencing data.

Bioinformatics. 2021 Jul 12;37(Suppl_1):i51-i58. doi: 10.1093/bioinformatics/btab286.

Continually adapting pre-trained language model to universal annotation of single-cell RNA-seq data.

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae047.

TripletCell: a deep metric learning framework for accurate annotation of cell types at the single-cell level.

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad132.

scSemiGAN: a single-cell semi-supervised annotation and dimensionality reduction framework based on generative adversarial network.

Bioinformatics. 2022 Nov 15;38(22):5042-5048. doi: 10.1093/bioinformatics/btac652.

CyAnno: a semi-automated approach for cell type annotation of mass cytometry datasets.

Bioinformatics. 2021 Nov 18;37(22):4164-4171. doi: 10.1093/bioinformatics/btab409.

The impacts of active and self-supervised learning on efficient annotation of single-cell expression data.

Nat Commun. 2024 Feb 3;15(1):1014. doi: 10.1038/s41467-024-45198-y.

scDeepSort: a pre-trained cell-type annotation method for single-cell transcriptomics using deep learning with a weighted graph neural network.

Nucleic Acids Res. 2021 Dec 2;49(21):e122. doi: 10.1093/nar/gkab775.

引用本文的文献

Deep learning based deconvolution methods: A systematic review.

Comput Struct Biotechnol J. 2025 Jun 11;27:2544-2565. doi: 10.1016/j.csbj.2025.05.038. eCollection 2025.

本文引用的文献

SCSA: A Cell Type Annotation Tool for Single-Cell RNA-seq Data.

Front Genet. 2020 May 12;11:490. doi: 10.3389/fgene.2020.00490. eCollection 2020.

scCATCH: Automatic Annotation on Cell Types of Clusters from Single-Cell RNA Sequencing Data.

iScience. 2020 Mar 27;23(3):100882. doi: 10.1016/j.isci.2020.100882. Epub 2020 Feb 4.

scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data.

Genome Biol. 2019 Dec 12;20(1):264. doi: 10.1186/s13059-019-1862-5.

Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling.

Nat Methods. 2019 Oct;16(10):1007-1015. doi: 10.1038/s41592-019-0529-1. Epub 2019 Sep 9.

The single-cell sequencing: new developments and medical applications.

Cell Biosci. 2019 Jun 26;9:53. doi: 10.1186/s13578-019-0314-y. eCollection 2019.

SingleCellNet: A Computational Tool to Classify Single Cell RNA-Seq Data Across Platforms and Across Species.

Cell Syst. 2019 Aug 28;9(2):207-213.e2. doi: 10.1016/j.cels.2019.06.004. Epub 2019 Jul 31.

SCINA: A Semi-Supervised Subtyping Algorithm of Single Cells and Bulk Samples.

Genes (Basel). 2019 Jul 12;10(7):531. doi: 10.3390/genes10070531.

CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing.

Nucleic Acids Res. 2019 Sep 19;47(16):e95. doi: 10.1093/nar/gkz543.

LAmbDA: label ambiguous domain adaptation dataset integration reduces batch effects and improves subtype detection.

Bioinformatics. 2019 Nov 1;35(22):4696-4706. doi: 10.1093/bioinformatics/btz295.

scMatch: a single-cell gene expression profile annotation tool using reference datasets.

Bioinformatics. 2019 Nov 1;35(22):4688-4695. doi: 10.1093/bioinformatics/btz292.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用深度学习对人类265种细胞类型进行单细胞类型注释。

Single-cell type annotation with deep learning in 265 cell types for humans.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献