• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

微细胞聚类:从单细胞表达数据中挖掘罕见且高度特异的亚群

MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data.

作者信息

Gerniers Alexander, Bricard Orian, Dupont Pierre

机构信息

ICTEAM/INGI/Artificial Intelligence and Algorithms Group, UCLouvain, Louvain-la-Neuve 1348, Belgium.

de Duve Institute, UCLouvain, Brussels 1200, Belgium.

出版信息

Bioinformatics. 2021 Oct 11;37(19):3220-3227. doi: 10.1093/bioinformatics/btab239.

DOI:10.1093/bioinformatics/btab239
PMID:33830183
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8504615/
Abstract

MOTIVATION

Identifying rare subpopulations of cells is a critical step in order to extract knowledge from single-cell expression data, especially when the available data is limited and rare subpopulations only contain a few cells. In this paper, we present a data mining method to identify small subpopulations of cells that present highly specific expression profiles. This objective is formalized as a constrained optimization problem that jointly identifies a small group of cells and a corresponding subset of specific genes. The proposed method extends the max-sum submatrix problem to yield genes that are, for instance, highly expressed inside a small number of cells, but have a low expression in the remaining ones.

RESULTS

We show through controlled experiments on scRNA-seq data that the MicroCellClust method achieves a high F1 score to identify rare subpopulations of artificially planted human T cells. The effectiveness of MicroCellClust is confirmed as it reveals a subpopulation of CD4 T cells with a specific phenotype from breast cancer samples, and a subpopulation linked to a specific stage in the cell cycle from breast cancer samples as well. Finally, three rare subpopulations in mouse embryonic stem cells are also identified with MicroCellClust. These results illustrate the proposed method outperforms typical alternatives at identifying small subsets of cells with highly specific expression profiles.

AVAILABILITYAND IMPLEMENTATION

The R and Scala implementation of MicroCellClust is freely available on GitHub, at https://github.com/agerniers/MicroCellClust/ The data underlying this article are available on Zenodo, at https://dx.doi.org/10.5281/zenodo.4580332.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

识别细胞的稀有亚群是从单细胞表达数据中提取知识的关键步骤,特别是当可用数据有限且稀有亚群仅包含少数细胞时。在本文中,我们提出了一种数据挖掘方法来识别呈现高度特异性表达谱的细胞小亚群。该目标被形式化为一个约束优化问题,该问题联合识别一小群细胞和相应的特定基因子集。所提出的方法扩展了最大和子矩阵问题,以产生例如在少数细胞内高表达但在其余细胞中低表达的基因。

结果

我们通过对scRNA-seq数据的对照实验表明,MicroCellClust方法在识别人工植入的人类T细胞的稀有亚群方面实现了高F1分数。MicroCellClust的有效性得到了证实,因为它从乳腺癌样本中揭示了具有特定表型的CD4 T细胞亚群,以及也与乳腺癌样本中细胞周期的特定阶段相关的亚群。最后,MicroCellClust还识别出了小鼠胚胎干细胞中的三个稀有亚群。这些结果表明,所提出的方法在识别具有高度特异性表达谱的细胞小子集方面优于典型的替代方法。

可用性和实现

MicroCellClust的R和Scala实现可在GitHub上免费获取,网址为https://github.com/agerniers/MicroCellClust/ 本文的基础数据可在Zenodo上获取,网址为https://dx.doi.org/10.5281/zenodo.4580332。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3325/8504615/f0b2de9555f3/btab239f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3325/8504615/df2924dd6bc0/btab239f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3325/8504615/df671e78d983/btab239f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3325/8504615/d9be4fbe7c3b/btab239f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3325/8504615/f0b2de9555f3/btab239f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3325/8504615/df2924dd6bc0/btab239f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3325/8504615/df671e78d983/btab239f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3325/8504615/d9be4fbe7c3b/btab239f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3325/8504615/f0b2de9555f3/btab239f4.jpg

相似文献

1
MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data.微细胞聚类:从单细胞表达数据中挖掘罕见且高度特异的亚群
Bioinformatics. 2021 Oct 11;37(19):3220-3227. doi: 10.1093/bioinformatics/btab239.
2
scCross: efficient search for rare subpopulations across multiple single-cell samples.scCross:在多个单细胞样本中高效搜索稀有亚群。
Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae371.
3
Imputing dropouts for single-cell RNA sequencing based on multi-objective optimization.基于多目标优化的单细胞RNA测序缺失值插补
Bioinformatics. 2022 Jun 13;38(12):3222-3230. doi: 10.1093/bioinformatics/btac300.
4
scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling.scPNMF:稀疏的单细胞基因编码,以方便选择用于靶向基因分析的基因。
Bioinformatics. 2021 Jul 12;37(Suppl_1):i358-i366. doi: 10.1093/bioinformatics/btab273.
5
A machine learning-based method for automatically identifying novel cells in annotating single-cell RNA-seq data.基于机器学习的方法,用于自动识别注释单细胞 RNA-seq 数据中的新型细胞。
Bioinformatics. 2022 Oct 31;38(21):4885-4892. doi: 10.1093/bioinformatics/btac617.
6
Identify, quantify and characterize cellular communication from single-cell RNA sequencing data with scSeqComm.使用scSeqComm从单细胞RNA测序数据中识别、量化和表征细胞间通讯。
Bioinformatics. 2022 Mar 28;38(7):1920-1929. doi: 10.1093/bioinformatics/btac036.
7
scGAC: a graph attentional architecture for clustering single-cell RNA-seq data.scGAC:一种用于聚类单细胞 RNA-seq 数据的图注意力架构。
Bioinformatics. 2022 Apr 12;38(8):2187-2193. doi: 10.1093/bioinformatics/btac099.
8
An Effective Biclustering-Based Framework for Identifying Cell Subpopulations From scRNA-seq Data.基于有效双聚类的 scRNA-seq 数据中细胞亚群识别框架。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2249-2260. doi: 10.1109/TCBB.2020.2979717. Epub 2021 Dec 8.
9
scGate: marker-based purification of cell types from heterogeneous single-cell RNA-seq datasets.scGate:基于标记的异质单细胞 RNA-seq 数据集细胞类型的纯化。
Bioinformatics. 2022 Apr 28;38(9):2642-2644. doi: 10.1093/bioinformatics/btac141.
10
PheneBank: a literature-based database of phenotypes.PheneBank:基于文献的表型数据库。
Bioinformatics. 2022 Jan 27;38(4):1179-1180. doi: 10.1093/bioinformatics/btab740.

引用本文的文献

1
Leveraging gene correlations in single cell transcriptomic data.利用单细胞转录组数据中的基因相关性。
BMC Bioinformatics. 2024 Sep 18;25(1):305. doi: 10.1186/s12859-024-05926-z.
2
scCAD: Cluster decomposition-based anomaly detection for rare cell identification in single-cell expression data.scCAD:基于聚类分解的单细胞表达数据中稀有细胞异常检测方法。
Nat Commun. 2024 Aug 31;15(1):7561. doi: 10.1038/s41467-024-51891-9.
3
Single-cell omics: experimental workflow, data analyses and applications.单细胞组学:实验工作流程、数据分析及应用
Sci China Life Sci. 2025 Jan;68(1):5-102. doi: 10.1007/s11427-023-2561-0. Epub 2024 Jul 23.
4
scCross: efficient search for rare subpopulations across multiple single-cell samples.scCross:在多个单细胞样本中高效搜索稀有亚群。
Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae371.
5
Self-supervised deep clustering of single-cell RNA-seq data to hierarchically detect rare cell populations.基于单细胞 RNA-seq 数据的自监督深度聚类来分层检测稀有细胞群体。
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad335.
6
ProgClust: A progressive clustering method to identify cell populations.ProgClust:一种用于识别细胞群体的渐进聚类方法。
Front Genet. 2023 Apr 6;14:1183099. doi: 10.3389/fgene.2023.1183099. eCollection 2023.
7
Leveraging gene correlations in single cell transcriptomic data.利用单细胞转录组数据中的基因相关性。
bioRxiv. 2023 Nov 1:2023.03.14.532643. doi: 10.1101/2023.03.14.532643.
8
Gene differential co-expression analysis of male infertility patients based on statistical and machine learning methods.基于统计和机器学习方法的男性不育患者基因差异共表达分析
Front Microbiol. 2023 Jan 27;14:1092143. doi: 10.3389/fmicb.2023.1092143. eCollection 2023.