Veneto Institute of Oncology IOV-IRCCS, Padova, Italy.
Section of Immunology, Department of Medicine, University of Verona, Verona, Italy.
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae322.
Cytometry comprises powerful techniques for analyzing the cell heterogeneity of a biological sample by examining the expression of protein markers. These technologies impact especially the field of oncoimmunology, where cell identification is essential to analyze the tumor microenvironment. Several classification tools have been developed for the annotation of cytometry datasets, which include supervised tools that require a training set as a reference (i.e. reference-based) and semisupervised tools based on the manual definition of a marker table. The latter is closer to the traditional annotation of cytometry data based on manual gating. However, they require the manual definition of a marker table that cannot be extracted automatically in a reference-based fashion. Therefore, we are lacking methods that allow both classification approaches while maintaining the high biological interpretability given by the marker table.
We present a new tool called GateMeClass (Gate Mining and Classification) which overcomes the limitation of the current methods of classification of cytometry data allowing both semisupervised and supervised annotation based on a marker table that can be defined manually or extracted from an external annotated dataset. We measured the accuracy of GateMeClass for annotating three well-established benchmark mass cytometry datasets and one flow cytometry dataset. The performance of GateMeClass is comparable to reference-based methods and marker table-based techniques, offering greater flexibility and rapid execution times.
GateMeClass is implemented in R language and is publicly available at https://github.com/simo1c/GateMeClass.
通过检查蛋白标志物的表达,细胞仪分析能够对生物样本的细胞异质性进行强有力的分析。这些技术尤其影响肿瘤免疫学领域,细胞鉴定对于分析肿瘤微环境至关重要。已经开发出几种分类工具来注释细胞仪数据集,其中包括需要参考数据集(即基于参考的)进行训练的监督工具和基于手动定义标记表的半监督工具。后者更接近基于手动门控的传统细胞仪数据注释。然而,它们需要手动定义标记表,而不能以基于参考的方式自动提取。因此,我们缺乏允许这两种分类方法的方法,同时保持标记表提供的高生物学可解释性。
我们提出了一种名为 GateMeClass(门控挖掘和分类)的新工具,该工具克服了当前细胞仪数据分类方法的局限性,允许基于可以手动定义或从外部注释数据集提取的标记表进行半监督和监督注释。我们测量了 GateMeClass 对三个成熟的基准质谱细胞仪数据集和一个流式细胞仪数据集进行注释的准确性。GateMeClass 的性能与基于参考的方法和基于标记表的技术相当,提供了更大的灵活性和快速的执行时间。
GateMeClass 是用 R 语言实现的,并在 https://github.com/simo1c/GateMeClass 上公开。