KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium.
Center for Microbial Ecology and Technology, Department of Biotechnology, Ghent University, 9000 Ghent, Belgium.
Cytometry A. 2019 Jul;95(7):782-791. doi: 10.1002/cyto.a.23792. Epub 2019 May 17.
Recent years have seen an increased interest in employing data analysis techniques for the automated identification of cell populations in the field of cytometry. These techniques highly depend on the use of a distance metric, a function that quantifies the distances between single-cell measurements. In most cases, researchers simply use the Euclidean distance metric. In this article, we exploit the availability of single-cell labels to find an optimal Mahalanobis distance metric derived from the data. We show that such a Mahalanobis distance metric results in an improved identification of cell populations compared with the Euclidean distance metric. Once determined, it can be used for the analysis of multiple samples that were measured under the same experimental setup. We illustrate this approach for cytometry data from two different origins, that is, flow cytometry applied to microbial cells and mass cytometry for the analysis of human blood cells. We also illustrate that such a distance metric results in an improved identification of cell populations when clustering methods are employed. Generally, these results imply that the performance of data analysis techniques can be improved by using a more advanced distance metric. © 2019 International Society for Advancement of Cytometry.
近年来,人们越来越感兴趣于在细胞术领域采用数据分析技术来自动识别细胞群体。这些技术高度依赖于距离度量的使用,距离度量是一种量化单细胞测量之间距离的函数。在大多数情况下,研究人员只是简单地使用欧几里得距离度量。在本文中,我们利用单细胞标签的可用性,从数据中找到一个最优的马氏距离度量。我们表明,与欧几里得距离度量相比,这样的马氏距离度量可以更有效地识别细胞群体。一旦确定,它就可以用于分析在相同实验设置下测量的多个样本。我们以两种不同来源的细胞术数据为例来说明这种方法,即应用于微生物细胞的流式细胞术和用于分析人类血细胞的质谱细胞术。我们还表明,当使用聚类方法时,这种距离度量可以更有效地识别细胞群体。通常情况下,这些结果意味着通过使用更先进的距离度量,数据分析技术的性能可以得到提高。 © 2019 国际细胞分析学会。