• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于机器学习的方法,用于自动识别注释单细胞 RNA-seq 数据中的新型细胞。

A machine learning-based method for automatically identifying novel cells in annotating single-cell RNA-seq data.

机构信息

Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.

Department of Leukemia, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.

出版信息

Bioinformatics. 2022 Oct 31;38(21):4885-4892. doi: 10.1093/bioinformatics/btac617.

DOI:10.1093/bioinformatics/btac617
PMID:36083008
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9801963/
Abstract

MOTIVATION

Single-cell RNA sequencing (scRNA-seq) has been widely used to decompose complex tissues into functionally distinct cell types. The first and usually the most important step of scRNA-seq data analysis is to accurately annotate the cell labels. In recent years, many supervised annotation methods have been developed and shown to be more convenient and accurate than unsupervised cell clustering. One challenge faced by all the supervised annotation methods is the identification of the novel cell type, which is defined as the cell type that is not present in the training data, only exists in the testing data. Existing methods usually label the cells simply based on the correlation coefficients or confidence scores, which sometimes results in an excessive number of unlabeled cells.

RESULTS

We developed a straightforward yet effective method combining autoencoder with iterative feature selection to automatically identify novel cells from scRNA-seq data. Our method trains an autoencoder with the labeled training data and applies the autoencoder to the testing data to obtain reconstruction errors. By iteratively selecting features that demonstrate a bi-modal pattern and reclustering the cells using the selected feature, our method can accurately identify novel cells that are not present in the training data. We further combined this approach with a support vector machine to provide a complete solution for annotating the full range of cell types. Extensive numerical experiments using five real scRNA-seq datasets demonstrated favorable performance of the proposed method over existing methods serving similar purposes.

AVAILABILITY AND IMPLEMENTATION

Our R software package CAMLU is publicly available through the Zenodo repository (https://doi.org/10.5281/zenodo.7054422) or GitHub repository (https://github.com/ziyili20/CAMLU).

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

单细胞 RNA 测序(scRNA-seq)已被广泛用于将复杂组织分解为具有不同功能的细胞类型。scRNA-seq 数据分析的第一步,也是通常最重要的一步,是准确注释细胞标签。近年来,已经开发出许多有监督的注释方法,并且被证明比无监督的细胞聚类更加方便和准确。所有有监督的注释方法都面临的一个挑战是识别新的细胞类型,这是指在训练数据中不存在,仅存在于测试数据中的细胞类型。现有的方法通常只是基于相关系数或置信分数来标记细胞,这有时会导致大量未标记的细胞。

结果

我们开发了一种简单而有效的方法,将自动编码器与迭代特征选择相结合,从 scRNA-seq 数据中自动识别新的细胞。我们的方法使用标记的训练数据训练自动编码器,并将自动编码器应用于测试数据以获得重构误差。通过迭代选择表现出双峰模式的特征,并使用所选特征重新聚类细胞,我们的方法可以准确识别不在训练数据中的新细胞。我们进一步将这种方法与支持向量机结合,为注释全范围的细胞类型提供了一个完整的解决方案。使用五个真实的 scRNA-seq 数据集进行的广泛数值实验表明,与具有相似用途的现有方法相比,所提出的方法具有更好的性能。

可用性和实现

我们的 R 软件包 CAMLU 可通过 Zenodo 存储库(https://doi.org/10.5281/zenodo.7054422)或 GitHub 存储库(https://github.com/ziyili20/CAMLU)公开使用。

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
A machine learning-based method for automatically identifying novel cells in annotating single-cell RNA-seq data.基于机器学习的方法,用于自动识别注释单细胞 RNA-seq 数据中的新型细胞。
Bioinformatics. 2022 Oct 31;38(21):4885-4892. doi: 10.1093/bioinformatics/btac617.
2
scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.scBGEDA:基于双分图集成分聚类的对偶去噪自动编码器的单细胞聚类分析。
Bioinformatics. 2023 Feb 14;39(2). doi: 10.1093/bioinformatics/btad075.
3
A neural network-based method for exhaustive cell label assignment using single cell RNA-seq data.基于神经网络的方法,利用单细胞 RNA-seq 数据进行全面的细胞标签分配。
Sci Rep. 2022 Jan 18;12(1):910. doi: 10.1038/s41598-021-04473-4.
4
scCNC: a method based on capsule network for clustering scRNA-seq data.scCNC:一种基于胶囊网络的 scRNA-seq 数据聚类方法。
Bioinformatics. 2022 Aug 2;38(15):3703-3709. doi: 10.1093/bioinformatics/btac393.
5
Vaeda computationally annotates doublets in single-cell RNA sequencing data.Vaeda 对单细胞 RNA 测序数据中的二联体进行了计算注释。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac720.
6
Deep structural clustering for single-cell RNA-seq data jointly through autoencoder and graph neural network.基于自动编码器和图神经网络的单细胞 RNA-seq 数据深度结构聚类。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac018.
7
SPANN: annotating single-cell resolution spatial transcriptome data with scRNA-seq data.SPANN:利用单细胞RNA测序数据注释单细胞分辨率空间转录组数据。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbad533.
8
CTISL: a dynamic stacking multi-class classification approach for identifying cell types from single-cell RNA-seq data.CTISL:一种动态堆叠多类分类方法,用于从单细胞 RNA-seq 数据中识别细胞类型。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae063.
9
Continually adapting pre-trained language model to universal annotation of single-cell RNA-seq data.持续调整预先训练的语言模型,以实现单细胞 RNA-seq 数据的通用注释。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae047.
10
scGCL: an imputation method for scRNA-seq data based on graph contrastive learning.scGCL:一种基于图对比学习的 scRNA-seq 数据插补方法。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad098.

引用本文的文献

1
An overview of computational methods in single-cell transcriptomic cell type annotation.单细胞转录组细胞类型注释中的计算方法概述。
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf207.
2
Mapping Cell Identity from scRNA-seq: A primer on computational methods.从单细胞RNA测序映射细胞身份:计算方法入门
Comput Struct Biotechnol J. 2025 Apr 2;27:1559-1569. doi: 10.1016/j.csbj.2025.03.051. eCollection 2025.
3
scCTS: identifying the cell type-specific marker genes from population-level single-cell RNA-seq.scCTS:从群体水平的单细胞 RNA-seq 中识别细胞类型特异性标记基因。
Genome Biol. 2024 Oct 14;25(1):269. doi: 10.1186/s13059-024-03410-8.
4
Detecting anomalous anatomic regions in spatial transcriptomics with STANDS.利用 STANDS 检测空间转录组学中的异常解剖区域。
Nat Commun. 2024 Sep 19;15(1):8223. doi: 10.1038/s41467-024-52445-9.
5
Inferring Novel Cells in Single-Cell RNA-Sequencing Data.推断单细胞 RNA 测序数据中的新细胞。
Methods Mol Biol. 2024;2812:143-154. doi: 10.1007/978-1-0716-3886-6_7.
6
CASi: A framework for cross-timepoint analysis of single-cell RNA sequencing data.CASi:单细胞 RNA 测序数据分析的跨时间点分析框架。
Sci Rep. 2024 May 9;14(1):10633. doi: 10.1038/s41598-024-58566-x.

本文引用的文献

1
Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction.评价单细胞 RNA-seq 中监督细胞类型识别的一些方面:分类器、特征选择和参考构建。
Genome Biol. 2021 Sep 9;22(1):264. doi: 10.1186/s13059-021-02480-2.
2
A single-cell and spatially resolved atlas of human breast cancers.人类乳腺癌的单细胞和空间分辨图谱。
Nat Genet. 2021 Sep;53(9):1334-1347. doi: 10.1038/s41588-021-00911-1. Epub 2021 Sep 6.
3
Mapping single-cell data to reference atlases by transfer learning.通过迁移学习将单细胞数据映射到参考图谱上。
Nat Biotechnol. 2022 Jan;40(1):121-130. doi: 10.1038/s41587-021-01001-7. Epub 2021 Aug 30.
4
Integrated analysis of multimodal single-cell data.多模态单细胞数据的综合分析。
Cell. 2021 Jun 24;184(13):3573-3587.e29. doi: 10.1016/j.cell.2021.04.048. Epub 2021 May 31.
5
Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods.教程:使用自动化和手动方法标注单细胞转录组图谱的指南。
Nat Protoc. 2021 Jun;16(6):2749-2764. doi: 10.1038/s41596-021-00534-0. Epub 2021 May 24.
6
Inference and analysis of cell-cell communication using CellChat.使用 CellChat 进行细胞间通讯的推断和分析。
Nat Commun. 2021 Feb 17;12(1):1088. doi: 10.1038/s41467-021-21246-9.
7
Fast and precise single-cell data analysis using a hierarchical autoencoder.使用层次自动编码器实现快速、精确的单细胞数据分析。
Nat Commun. 2021 Feb 15;12(1):1029. doi: 10.1038/s41467-021-21312-2.
8
Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes.从单细胞转录组中描绘人类肿瘤的拷贝数和克隆亚结构。
Nat Biotechnol. 2021 May;39(5):599-608. doi: 10.1038/s41587-020-00795-2. Epub 2021 Jan 18.
9
Pan-cancer single-cell RNA-seq identifies recurring programs of cellular heterogeneity.泛癌单细胞 RNA-seq 鉴定出细胞异质性的重现性程序。
Nat Genet. 2020 Nov;52(11):1208-1218. doi: 10.1038/s41588-020-00726-6. Epub 2020 Oct 30.
10
Single-cell transcriptomic analysis identifies extensive heterogeneity in the cellular composition of mouse Achilles tendons.单细胞转录组分析鉴定出小鼠跟腱细胞组成的广泛异质性。
Am J Physiol Cell Physiol. 2020 Nov 1;319(5):C885-C894. doi: 10.1152/ajpcell.00372.2020. Epub 2020 Sep 2.