质谱流式细胞术数据分析中的聚类稳定性

Cluster stability in the analysis of mass cytometry data.

作者信息

Melchiotti Rossella, Gracio Filipe, Kordasti Shahram, Todd Alan K, de Rinaldis Emanuele

机构信息

Guy's and St Thomas' NHS Foundation Trust and King's College London, Translational Bioinformatics Platform - R&D Department. Biomedical Research Centre, London, SE1 9RT, United Kingdom.

Department of Haematological Medicine, Cancer Studies Division King's College London, Rayne Institute, London, SE5 9NU, United Kingdom.

出版信息

Cytometry A. 2017 Jan;91(1):73-84. doi: 10.1002/cyto.a.23001. Epub 2016 Oct 18.

DOI:10.1002/cyto.a.23001

PMID:27754590

Abstract

Manual gating has been traditionally applied to cytometry data sets to identify cells based on protein expression. The advent of mass cytometry allows for a higher number of proteins to be simultaneously measured on cells, therefore providing a means to define cell clusters in a high dimensional expression space. This enhancement, whilst opening unprecedented opportunities for single cell-level analyses, makes the incremental replacement of manual gating with automated clustering a compelling need. To this aim many methods have been implemented and their successful applications demonstrated in different settings. However, the reproducibility of automatically generated clusters is proving challenging and an analytical framework to distinguish spurious clusters from more stable entities, and presumably more biologically relevant ones, is still missing. One way to estimate cell clusters' stability is the evaluation of their consistent re-occurrence within- and between-algorithms, a metric that is commonly used to evaluate results from gene expression. Herein we report the usage and importance of cluster stability evaluations, when applied to results generated from three popular clustering algorithms - SPADE, FLOCK and PhenoGraph - run on four different data sets. These algorithms were shown to generate clusters with various degrees of statistical stability, many of them being unstable. By comparing the results of automated clustering with manually gated populations, we illustrate how information on cluster stability can assist towards a more rigorous and informed interpretation of clustering results. We also explore the relationships between statistical stability and other properties such as clusters' compactness and isolation, demonstrating that whilst cluster stability is linked to other properties it cannot be reliably predicted by any of them. Our study proposes the introduction of cluster stability as a necessary checkpoint for cluster interpretation and contributes to the construction of a more systematic and standardized analytical framework for the assessment of cytometry clustering results. © 2016 International Society for Advancement of Cytometry.

摘要

传统上，手动设门已应用于细胞计数数据集，以基于蛋白质表达来识别细胞。质谱细胞术的出现使得能够在细胞上同时测量更多数量的蛋白质，从而提供了一种在高维表达空间中定义细胞簇的方法。这种改进虽然为单细胞水平分析带来了前所未有的机遇，但使得用自动聚类逐步取代手动设门成为迫切需求。为此，已经实施了许多方法，并在不同环境中证明了它们的成功应用。然而，自动生成的簇的可重复性被证明具有挑战性，并且仍然缺少一个分析框架来区分虚假簇与更稳定的实体，以及可能更具生物学相关性的实体。估计细胞簇稳定性的一种方法是评估它们在算法内部和算法之间的一致重现性，这是一种常用于评估基因表达结果的指标。在此，我们报告了簇稳定性评估的用途和重要性，该评估应用于由三种流行的聚类算法——SPADE、FLOCK和PhenoGraph——在四个不同数据集上运行所生成的结果。结果表明，这些算法生成的簇具有不同程度的统计稳定性，其中许多是不稳定的。通过将自动聚类的结果与手动设门群体进行比较，我们说明了关于簇稳定性的信息如何有助于对聚类结果进行更严谨和明智的解释。我们还探讨了统计稳定性与其他属性（如簇的紧凑性和孤立性）之间的关系，表明虽然簇稳定性与其他属性相关，但不能通过其中任何一个属性可靠地预测它。我们的研究建议引入簇稳定性作为簇解释的必要检查点，并有助于构建一个更系统和标准化的分析框架来评估细胞计数聚类结果。© 2016国际细胞计量学促进协会。

相似文献

Cluster stability in the analysis of mass cytometry data.

Cytometry A. 2017 Jan;91(1):73-84. doi: 10.1002/cyto.a.23001. Epub 2016 Oct 18.

Detecting clusters of different geometrical shapes in microarray gene expression data.

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

Clustering of gene expression data: performance and similarity analysis.

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19.

Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data.

Cytometry A. 2016 Dec;89(12):1084-1096. doi: 10.1002/cyto.a.23030. Epub 2016 Dec 19.

Model order selection for bio-molecular data clustering.

BMC Bioinformatics. 2007 May 3;8 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2105-8-S2-S7.

Evaluation of clustering algorithms for gene expression data.

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S17. doi: 10.1186/1471-2105-7-S4-S17.

Clustering of change patterns using Fourier coefficients.

Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19.

Stability-based validation of clustering solutions.

Neural Comput. 2004 Jun;16(6):1299-323. doi: 10.1162/089976604773717621.

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.

Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.

Studies on the Clustering Algorithm for Analyzing Gene Expression Data with a Bidirectional Penalty.

J Comput Biol. 2017 Jul;24(7):689-698. doi: 10.1089/cmb.2017.0051. Epub 2017 May 10.

引用本文的文献

Unsupervised machine learning reveals key immune cell subsets in COVID-19, rhinovirus infection, and cancer therapy.

Elife. 2021 Aug 5;10:e64653. doi: 10.7554/eLife.64653.

TYRO3 induces anti-PD-1/PD-L1 therapy resistance by limiting innate immunity and tumoral ferroptosis.

J Clin Invest. 2021 Apr 15;131(8). doi: 10.1172/JCI139434.

Mass Cytometry Defines Virus-Specific CD4 T Cells in Influenza Vaccination.

Immunohorizons. 2020 Dec 11;4(12):774-788. doi: 10.4049/immunohorizons.1900097.

Unsupervised machine learning reveals key immune cell subsets in COVID-19, rhinovirus infection, and cancer therapy.

bioRxiv. 2020 Nov 4:2020.07.31.190454. doi: 10.1101/2020.07.31.190454.

Proteomics and bioinformatics analysis of Fasciola hepatica somatic proteome in different growth phases.

Parasitol Res. 2020 Sep;119(9):2837-2850. doi: 10.1007/s00436-020-06833-x. Epub 2020 Aug 5.

Unsupervised machine learning reveals risk stratifying glioblastoma tumor cells.

Elife. 2020 Jun 23;9:e56879. doi: 10.7554/eLife.56879.

Key steps and methods in the experimental design and data analysis of highly multi-parametric flow and mass cytometry.

Comput Struct Biotechnol J. 2020 Mar 31;18:874-886. doi: 10.1016/j.csbj.2020.03.024. eCollection 2020.

A comparison framework and guideline of clustering methods for mass cytometry data.

Genome Biol. 2019 Dec 23;20(1):297. doi: 10.1186/s13059-019-1917-7.

Combination anti-CTLA-4 plus anti-PD-1 checkpoint blockade utilizes cellular mechanisms partially distinct from monotherapies.

Proc Natl Acad Sci U S A. 2019 Nov 5;116(45):22699-22709. doi: 10.1073/pnas.1821218116. Epub 2019 Oct 21.

Automated subset identification and characterization pipeline for multidimensional flow and mass cytometry data clustering and visualization.

Commun Biol. 2019 Jun 20;2:229. doi: 10.1038/s42003-019-0467-6. eCollection 2019.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

质谱流式细胞术数据分析中的聚类稳定性

Cluster stability in the analysis of mass cytometry data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献