• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

scCross:在多个单细胞样本中高效搜索稀有亚群。

scCross: efficient search for rare subpopulations across multiple single-cell samples.

机构信息

ICTEAM/INGI/Artificial Intelligence and Algorithms Group, UCLouvain, Louvain-la-Neuve 1348, Belgium.

出版信息

Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae371.

DOI:10.1093/bioinformatics/btae371
PMID:38889273
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11256925/
Abstract

MOTIVATION

Identifying rare cell types is an important task to capture the heterogeneity of single-cell data, such as scRNA-seq. The widespread availability of such data enables to aggregate multiple samples, corresponding for example to different donors, into the same study. Yet, such aggregated data is often subject to batch effects between samples. Clustering it therefore generally requires the use of data integration methods, which can lead to overcorrection, making the identification of rare cells difficult. We present scCross, a biclustering method identifying rare subpopulations of cells present across multiple single-cell samples. It jointly identifies a group of cells with specific marker genes by relying on a global sum criterion, computed over entire subpopulation of cells, rather than pairwise comparisons between individual cells. This proves robust with respect to the high variability of scRNA-seq data, in particular batch effects.

RESULTS

We show through several case studies that scCross is able to identify rare subpopulations across multiple samples without performing prior data integration. Namely, it identifies a cilium subpopulation with potential new ciliary genes from lung cancer cells, which is not detected by typical alternatives. It also highlights rare subpopulations in human pancreas samples sequenced with different protocols, despite visible shifts in expression levels between batches. We further show that scCross outperforms typical alternatives at identifying a target rare cell type in a controlled experiment with artificially created batch effects. This shows the ability of scCross to efficiently identify rare cell subpopulations characterized by specific genes despite the presence of batch effects.

AVAILABILITY AND IMPLEMENTATION

The R and Scala implementation of scCross is freely available on GitHub, at https://github.com/agerniers/scCross/. A snapshot of the code and the data underlying this article are available on Zenodo, at https://zenodo.org/doi/10.5281/zenodo.10471063.

摘要

动机

鉴定稀有细胞类型是捕获单细胞数据异质性的一项重要任务,例如 scRNA-seq。这种数据的广泛可获取性使得能够将多个样本(例如对应于不同供体)聚合到同一研究中。然而,这种聚合数据通常受到样本之间批次效应的影响。因此,对其进行聚类通常需要使用数据集成方法,这可能导致过度校正,从而难以鉴定稀有细胞。我们提出了 scCross,这是一种用于鉴定跨多个单细胞样本存在的稀有细胞亚群的双聚类方法。它通过依赖于全局求和标准来共同鉴定具有特定标记基因的细胞群,该标准是在整个细胞亚群上计算的,而不是在个体细胞之间进行两两比较。这在 scRNA-seq 数据的高度可变性(特别是批次效应)方面表现出稳健性。

结果

我们通过几个案例研究表明,scCross 能够在不进行先前数据集成的情况下跨多个样本鉴定稀有亚群。具体来说,它从肺癌细胞中鉴定出具有潜在新纤毛基因的纤毛亚群,而这是典型替代方法无法检测到的。它还突出显示了用不同方案测序的人类胰腺样本中的稀有亚群,尽管批次之间的表达水平存在明显变化。我们进一步表明,scCross 在具有人工创建的批次效应的受控实验中鉴定目标稀有细胞类型的性能优于典型替代方法。这表明 scCross 能够有效地鉴定具有特定基因特征的稀有细胞亚群,尽管存在批次效应。

可用性和实现

scCross 的 R 和 Scala 实现可在 GitHub 上免费获得,网址为 https://github.com/agerniers/scCross/。本文的代码和数据快照可在 Zenodo 上获得,网址为 https://zenodo.org/doi/10.5281/zenodo.10471063。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aac6/11256925/2b13f6231b4c/btae371f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aac6/11256925/b0ff83925c6f/btae371f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aac6/11256925/3573a599a00f/btae371f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aac6/11256925/d4adf3fdf02e/btae371f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aac6/11256925/2ec643c31df6/btae371f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aac6/11256925/4c7edf9f8c51/btae371f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aac6/11256925/2b13f6231b4c/btae371f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aac6/11256925/b0ff83925c6f/btae371f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aac6/11256925/3573a599a00f/btae371f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aac6/11256925/d4adf3fdf02e/btae371f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aac6/11256925/2ec643c31df6/btae371f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aac6/11256925/4c7edf9f8c51/btae371f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aac6/11256925/2b13f6231b4c/btae371f6.jpg

相似文献

1
scCross: efficient search for rare subpopulations across multiple single-cell samples.scCross:在多个单细胞样本中高效搜索稀有亚群。
Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae371.
2
MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data.微细胞聚类:从单细胞表达数据中挖掘罕见且高度特异的亚群
Bioinformatics. 2021 Oct 11;37(19):3220-3227. doi: 10.1093/bioinformatics/btab239.
3
scTPC: a novel semisupervised deep clustering model for scRNA-seq data.scTPC:一种用于 scRNA-seq 数据的新型半监督深度聚类模型。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae293.
4
scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.scBGEDA:基于双分图集成分聚类的对偶去噪自动编码器的单细胞聚类分析。
Bioinformatics. 2023 Feb 14;39(2). doi: 10.1093/bioinformatics/btad075.
5
BERMAD: batch effect removal for single-cell RNA-seq data using a multi-layer adaptation autoencoder with dual-channel framework.BERMAD:基于双通道框架的多层自适应自动编码器去除单细胞 RNA-seq 数据中的批次效应
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae127.
6
scDetect: a rank-based ensemble learning algorithm for cell type identification of single-cell RNA sequencing in cancer.scDetect:一种基于排序的集成学习算法,用于癌症中单细胞RNA测序的细胞类型识别。
Bioinformatics. 2021 Nov 18;37(22):4115-4122. doi: 10.1093/bioinformatics/btab410.
7
Identifying Cell Subpopulations and Their Genetic Drivers from Single-Cell RNA-Seq Data Using a Biclustering Approach.使用双聚类方法从单细胞RNA测序数据中识别细胞亚群及其遗传驱动因素。
J Comput Biol. 2017 Jul;24(7):663-674. doi: 10.1089/cmb.2017.0049.
8
scMAE: a masked autoencoder for single-cell RNA-seq clustering.scMAE:一种用于单细胞 RNA-seq 聚类的掩蔽自动编码器。
Bioinformatics. 2024 Jan 2;40(1). doi: 10.1093/bioinformatics/btae020.
9
Ragas: integration and enhanced visualization for single cell subcluster analysis.拉加斯:单细胞亚群分析的集成和增强可视化。
Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae366.
10
scNAME: neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data.scNAME:基于辅助掩模估计的 scRNA-seq 数据邻域对比聚类。
Bioinformatics. 2022 Mar 4;38(6):1575-1583. doi: 10.1093/bioinformatics/btac011.

本文引用的文献

1
Benchmarking atlas-level data integration in single-cell genomics.单细胞基因组学中图谱级数据整合的基准测试。
Nat Methods. 2022 Jan;19(1):41-50. doi: 10.1038/s41592-021-01336-8. Epub 2021 Dec 23.
2
Integration of single cell data by disentangled representation learning.基于解缠表示学习的单细胞数据整合。
Nucleic Acids Res. 2022 Jan 25;50(2):e8. doi: 10.1093/nar/gkab978.
3
Computational principles and challenges in single-cell data integration.单细胞数据整合的计算原理与挑战。
Nat Biotechnol. 2021 Oct;39(10):1202-1215. doi: 10.1038/s41587-021-00895-7. Epub 2021 May 3.
4
MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data.微细胞聚类:从单细胞表达数据中挖掘罕见且高度特异的亚群
Bioinformatics. 2021 Oct 11;37(19):3220-3227. doi: 10.1093/bioinformatics/btab239.
5
scAIDE: clustering of large-scale single-cell RNA-seq data reveals putative and rare cell types.scAIDE:大规模单细胞RNA测序数据的聚类揭示了假定的和罕见的细胞类型。
NAR Genom Bioinform. 2020 Oct 9;2(4):lqaa082. doi: 10.1093/nargab/lqaa082. eCollection 2020 Dec.
6
GiniClust3: a fast and memory-efficient tool for rare cell type identification.GiniClust3:一种快速且节省内存的稀有细胞类型识别工具。
BMC Bioinformatics. 2020 Apr 25;21(1):158. doi: 10.1186/s12859-020-3482-1.
7
Identifying gene-specific subgroups: an alternative to biclustering.鉴定基因特异性亚组:一种替代双聚类的方法。
BMC Bioinformatics. 2019 Dec 3;20(1):625. doi: 10.1186/s12859-019-3289-0.
8
Comprehensive Integration of Single-Cell Data.单细胞数据的综合整合。
Cell. 2019 Jun 13;177(7):1888-1902.e21. doi: 10.1016/j.cell.2019.05.031. Epub 2019 Jun 6.
9
CiliaCarta: An integrated and validated compendium of ciliary genes.纤毛图谱数据库:一个综合且经过验证的纤毛基因文库。
PLoS One. 2019 May 16;14(5):e0216705. doi: 10.1371/journal.pone.0216705. eCollection 2019.
10
Efficient integration of heterogeneous single-cell transcriptomes using Scanorama.使用 Scanorama 实现高效的异质单细胞转录组整合。
Nat Biotechnol. 2019 Jun;37(6):685-691. doi: 10.1038/s41587-019-0113-3. Epub 2019 May 6.