Wang Zexuan, Chen Jiong, Ionita Matei, Zhan Qipeng, Zhou Zhuoping, Shen Li
Graduate Group in Applied Mathematics and Computational Science, University of Pennsylvania, Philadelphia, PA, United States.
Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, United States.
Exp Biol Med (Maywood). 2025 May 21;250:10445. doi: 10.3389/ebm.2025.10445. eCollection 2025.
Mass cytometry enables high-throughput characterization of heterogeneous cell populations at single-cell resolution, using metal isotopes to capture cellular signals and avoiding the spectral overlap common in flow cytometry. Despite advancements, conventional data analysis often focuses on manual gating or clustering within specific samples, overlooking disparities across subjects or biological samples. To address this gap, we propose a novel framework that treats the cell-by-protein matrix as a high-dimensional distribution, using Quantized Optimal Transport (QOT) to quantify distances between samples based on their cellular protein expression profiles. This approach allows for a direct comparison of distributions without relying on predefined gating strategies, capturing subtle variations in the data. We validated our method through two experiments using real-world time-series Coronavirus Disease 2019 (COVID-19) cytometry data. First, we conducted a leave-one-out analysis to identify immunologically unstable proteins over time, revealing CD3 and CD45 as the proteins changing the most during the vaccine response. Second, we aimed to capture individual immune fingerprints over time by calculating pairwise Wasserstein distances between samples and applying hierarchical clustering. Using silhouette scores to evaluate clustering effectiveness, we identified optimal combinations of immunological markers that effectively grouped samples from the same participant across different time points. Our findings demonstrate that the QOT framework provides a robust and flexible tool for cohort-level analysis of mass cytometry data, enabling the identification of unstable immunological markers and capturing immune response heterogeneity among vaccinated cohorts.
质谱流式细胞术能够在单细胞分辨率下对异质细胞群体进行高通量表征,它使用金属同位素来捕获细胞信号,避免了流式细胞术中常见的光谱重叠问题。尽管取得了进展,但传统的数据分析通常侧重于特定样本内的手动设门或聚类,而忽略了不同受试者或生物样本之间的差异。为了弥补这一差距,我们提出了一个新颖的框架,将细胞-蛋白质矩阵视为高维分布,使用量化最优传输(QOT)基于细胞蛋白质表达谱来量化样本之间的距离。这种方法允许直接比较分布,而无需依赖预定义的设门策略,从而捕捉数据中的细微变化。我们通过两项实验使用真实世界的2019冠状病毒病(COVID-19)质谱流式细胞术时间序列数据验证了我们的方法。首先,我们进行了留一法分析,以识别随时间变化的免疫不稳定蛋白质,发现CD3和CD45是疫苗反应期间变化最大的蛋白质。其次,我们旨在通过计算样本之间的成对瓦瑟斯坦距离并应用层次聚类来捕捉随时间变化的个体免疫指纹。使用轮廓系数来评估聚类效果,我们确定了免疫标记的最佳组合,这些组合有效地将来自同一参与者的样本在不同时间点进行了分组。我们的研究结果表明,QOT框架为质谱流式细胞术数据的队列水平分析提供了一个强大且灵活的工具,能够识别不稳定的免疫标记并捕捉接种队列之间的免疫反应异质性。