Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.
Corporate/Research, Still Pond Cytomics, West Chester, PA, USA.
Cytometry A. 2021 Feb;99(2):133-144. doi: 10.1002/cyto.a.24307.
Automated clustering workflows are increasingly used for the analysis of high parameter flow cytometry data. This trend calls for algorithms which are able to quickly process tens of millions of data points, to compare results across subjects or time points, and to provide easily actionable interpretations of the results. To this end, we created Tailor, a model-based clustering algorithm specialized for flow cytometry data. Our approach leverages a phenotype-aware binning scheme to provide a coarse model of the data, which is then refined using a multivariate Gaussian mixture model. We benchmark Tailor using a simulation study and two flow cytometry data sets, and show that the results are robust to moderate departures from normality and inter-sample variation. Moreover, Tailor provides automated, non-overlapping annotations of its clusters, which facilitates interpretation of results and downstream analysis. Tailor is released as an R package, and the source code is publicly available at www.github.com/matei-ionita/Tailor.
自动化聚类工作流程越来越多地用于分析高参数流式细胞术数据。这种趋势需要能够快速处理数千万个数据点的算法,以便在对象或时间点之间比较结果,并提供易于操作的结果解释。为此,我们创建了 Tailor,这是一种专门用于流式细胞术数据的基于模型的聚类算法。我们的方法利用了一种基于表型的分箱方案来提供数据的粗略模型,然后使用多元高斯混合模型对其进行细化。我们使用模拟研究和两个流式细胞术数据集对 Tailor 进行基准测试,并表明结果对中度偏离正态性和样本间变异具有鲁棒性。此外,Tailor 提供其聚类的自动、非重叠注释,这有助于解释结果和下游分析。Tailor 作为一个 R 包发布,其源代码可在 www.github.com/matei-ionita/Tailor 上获得。