微细胞聚类：从单细胞表达数据中挖掘罕见且高度特异的亚群

MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data.

作者信息

Gerniers Alexander, Bricard Orian, Dupont Pierre

机构信息

ICTEAM/INGI/Artificial Intelligence and Algorithms Group, UCLouvain, Louvain-la-Neuve 1348, Belgium.

de Duve Institute, UCLouvain, Brussels 1200, Belgium.

出版信息

Bioinformatics. 2021 Oct 11;37(19):3220-3227. doi: 10.1093/bioinformatics/btab239.

DOI:10.1093/bioinformatics/btab239

PMID:33830183

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8504615/

Abstract

MOTIVATION

Identifying rare subpopulations of cells is a critical step in order to extract knowledge from single-cell expression data, especially when the available data is limited and rare subpopulations only contain a few cells. In this paper, we present a data mining method to identify small subpopulations of cells that present highly specific expression profiles. This objective is formalized as a constrained optimization problem that jointly identifies a small group of cells and a corresponding subset of specific genes. The proposed method extends the max-sum submatrix problem to yield genes that are, for instance, highly expressed inside a small number of cells, but have a low expression in the remaining ones.

RESULTS

We show through controlled experiments on scRNA-seq data that the MicroCellClust method achieves a high F1 score to identify rare subpopulations of artificially planted human T cells. The effectiveness of MicroCellClust is confirmed as it reveals a subpopulation of CD4 T cells with a specific phenotype from breast cancer samples, and a subpopulation linked to a specific stage in the cell cycle from breast cancer samples as well. Finally, three rare subpopulations in mouse embryonic stem cells are also identified with MicroCellClust. These results illustrate the proposed method outperforms typical alternatives at identifying small subsets of cells with highly specific expression profiles.

AVAILABILITYAND IMPLEMENTATION

The R and Scala implementation of MicroCellClust is freely available on GitHub, at https://github.com/agerniers/MicroCellClust/ The data underlying this article are available on Zenodo, at https://dx.doi.org/10.5281/zenodo.4580332.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

识别细胞的稀有亚群是从单细胞表达数据中提取知识的关键步骤，特别是当可用数据有限且稀有亚群仅包含少数细胞时。在本文中，我们提出了一种数据挖掘方法来识别呈现高度特异性表达谱的细胞小亚群。该目标被形式化为一个约束优化问题，该问题联合识别一小群细胞和相应的特定基因子集。所提出的方法扩展了最大和子矩阵问题，以产生例如在少数细胞内高表达但在其余细胞中低表达的基因。

结果

我们通过对scRNA-seq数据的对照实验表明，MicroCellClust方法在识别人工植入的人类T细胞的稀有亚群方面实现了高F1分数。MicroCellClust的有效性得到了证实，因为它从乳腺癌样本中揭示了具有特定表型的CD4 T细胞亚群，以及也与乳腺癌样本中细胞周期的特定阶段相关的亚群。最后，MicroCellClust还识别出了小鼠胚胎干细胞中的三个稀有亚群。这些结果表明，所提出的方法在识别具有高度特异性表达谱的细胞小子集方面优于典型的替代方法。

可用性和实现

MicroCellClust的R和Scala实现可在GitHub上免费获取，网址为https://github.com/agerniers/MicroCellClust/ 本文的基础数据可在Zenodo上获取，网址为https://dx.doi.org/10.5281/zenodo.4580332。

补充信息

补充数据可在《生物信息学》在线获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

微细胞聚类：从单细胞表达数据中挖掘罕见且高度特异的亚群

MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITYAND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

微细胞聚类：从单细胞表达数据中挖掘罕见且高度特异的亚群

MicroCellClust: mining rare and highly specific subpopulations from single-cell expression data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITYAND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献