Burke Paulo E P, Strange Ann, Monk Emily, Thompson Brian, Amato Carol M, Woods David M
Division of Medical Oncology, Department of Medicine, University of Colorado School of Medicine, Aurora, CO 80045, USA.
Bioinform Adv. 2022 Aug 3;2(1):vbac052. doi: 10.1093/bioadv/vbac052. eCollection 2022.
High-dimensional cytometry assays can simultaneously measure dozens of markers, enabling the investigation of complex phenotypes. However, as manual gating relies on previous biological knowledge, few marker combinations are often assessed. This results in complex phenotypes with the potential for biological relevance being overlooked. Here, we present PhenoComb, an R package that allows agnostic exploration of phenotypes by assessing all combinations of markers. PhenoComb uses signal intensity thresholds to assign markers to discrete states (e.g. negative, low, high) and then counts the number of cells per sample from all possible marker combinations in a memory-safe manner. Time and disk space are the only constraints on the number of markers evaluated. PhenoComb also provides several approaches to perform statistical comparisons, evaluate the relevance of phenotypes and assess the independence of identified phenotypes. PhenoComb allows users to guide analysis by adjusting several function arguments, such as identifying parent populations of interest, filtering of low-frequency populations and defining a maximum complexity of phenotypes to evaluate. We have designed PhenoComb to be compatible with a local computer or server-based use.
In testing of PhenoComb's performance on synthetic datasets, computation on 16 markers was completed in the scale of minutes and up to 26 markers in hours. We applied PhenoComb to two publicly available datasets: an HIV flow cytometry dataset (12 markers and 421 samples) and the COVIDome CyTOF dataset (40 markers and 99 samples). In the HIV dataset, PhenoComb identified immune phenotypes associated with HIV seroconversion, including those highlighted in the original publication. In the COVID dataset, we identified several immune phenotypes with altered frequencies in infected individuals relative to healthy individuals. Collectively, PhenoComb represents a powerful discovery tool for agnostically assessing high-dimensional single-cell data.
The PhenoComb R package can be downloaded from https://github.com/SciOmicsLab/PhenoComb.
Supplementary data are available at online.
高维细胞计数分析能够同时测量数十种标志物,从而对复杂表型进行研究。然而,由于手动设门依赖于先前的生物学知识,常常只能评估少数标志物组合。这就导致具有潜在生物学相关性的复杂表型被忽视。在此,我们展示了PhenoComb,这是一个R软件包,可通过评估标志物的所有组合对表型进行无偏倚探索。PhenoComb使用信号强度阈值将标志物分配到离散状态(例如阴性、低、高),然后以内存安全的方式计算每个样本中所有可能标志物组合的细胞数量。时间和磁盘空间是评估标志物数量的唯一限制因素。PhenoComb还提供了几种进行统计比较、评估表型相关性以及评估所鉴定表型独立性的方法。PhenoComb允许用户通过调整几个函数参数来指导分析,例如识别感兴趣的亲本群体、过滤低频群体以及定义要评估的表型的最大复杂性。我们将PhenoComb设计为与本地计算机或基于服务器的使用兼容。
在对合成数据集测试PhenoComb的性能时,对16个标志物的计算在数分钟内完成,对多达26个标志物的计算在数小时内完成。我们将PhenoComb应用于两个公开可用的数据集:一个HIV流式细胞术数据集(12个标志物和421个样本)以及COVIDome CyTOF数据集(40个标志物和99个样本)。在HIV数据集中,PhenoComb鉴定出了与HIV血清转化相关的免疫表型,包括原始出版物中突出显示的那些。在COVID数据集中,我们鉴定出了几个在感染个体中相对于健康个体频率发生改变的免疫表型。总体而言,PhenoComb是一种用于无偏倚评估高维单细胞数据的强大发现工具。
PhenoComb R软件包可从https://github.com/SciOmicsLab/PhenoComb下载。
补充数据可在网上获取。