Liu Peng, Pan Yuchen, Chang Hung-Ching, Wang Wenjia, Fang Yusi, Xue Xiangning, Zou Jian, Toothaker Jessica M, Olaloye Oluwabunmi, Santiago Eduardo Gonzalez, McCourt Black, Mitsialis Vanessa, Presicce Pietro, Kallapur Suhas G, Snapper Scott B, Liu Jia-Jun, Tseng George C, Konnikova Liza, Liu Silvia
Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US.
Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, 1400 Pressler St., Houston, TX 77030, US.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae633.
Cytometry is an advanced technique for simultaneously identifying and quantifying many cell surface and intracellular proteins at a single-cell resolution. Analyzing high-dimensional cytometry data involves identifying and quantifying cell populations based on their marker expressions. This study provided a quantitative review and comparison of various ways to phenotype cellular populations within the cytometry data, including manual gating, unsupervised clustering, and supervised auto-gating. Six datasets from diverse species and sample types were included in the study, and manual gating with two hierarchical layers was used as the truth for evaluation. For manual gating, results from five researchers were compared to illustrate the gating consistency among different raters. For unsupervised clustering, 23 tools were quantitatively compared in terms of accuracy with the truth and computing cost. While no method outperformed all others, several tools, including PAC-MAN, CCAST, FlowSOM, flowClust, and DEPECHE, generally demonstrated strong performance. For supervised auto-gating methods, four algorithms were evaluated, where DeepCyTOF and CyTOF Linear Classifier performed the best. We further provided practical recommendations on prioritizing gating methods based on different application scenarios. This study offers comprehensive insights for biologists to understand diverse gating methods and choose the best-suited ones for their applications.
细胞计数是一种先进技术,可在单细胞分辨率下同时识别和定量许多细胞表面和细胞内蛋白。分析高维细胞计数数据涉及根据细胞标志物表达来识别和定量细胞群体。本研究对细胞计数数据中细胞群体表型分析的各种方法进行了定量综述和比较,包括手动设门、无监督聚类和有监督自动设门。该研究纳入了来自不同物种和样本类型的六个数据集,并将具有两个层次层的手动设门用作评估的标准。对于手动设门,比较了五位研究人员的结果,以说明不同评分者之间的设门一致性。对于无监督聚类,从与标准的准确性和计算成本方面对23种工具进行了定量比较。虽然没有一种方法优于所有其他方法,但包括PAC-MAN、CCAST、FlowSOM、flowClust和DEPECHE在内的几种工具总体上表现出强大的性能。对于有监督自动设门方法,评估了四种算法,其中DeepCyTOF和CyTOF线性分类器表现最佳。我们还根据不同应用场景就设门方法的优先级提供了实用建议。本研究为生物学家理解各种设门方法并选择最适合其应用的方法提供了全面的见解。