Tursi Amanda R, Lages Celine S, Quayle Kenneth, Koenig Zachary T, Loni Rashi, Eswar Shruti, Cobeña-Reyes José, Thornton Sherry, Tilburgs Tamara, Andorf Sandra
Department of Biomedical Informatics, University of Cincinnati College of Medicine, Cincinnati, OH, USA.
Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
bioRxiv. 2025 Mar 14:2025.03.11.639902. doi: 10.1101/2025.03.11.639902.
Advances in cytometry have led to increases in the number of cellular markers that are routinely measured. The resulting complexity of the data has prompted a shift from manual to automated analysis methods. Currently, numerous unsupervised methods are available to cluster cells based on marker expression values. However, phenotyping the resulting clusters is typically not part of the automated process. Manually identifying both marker definitions (e.g. CD4, CCR7, CD45RA, CD19) and descriptive cell type names (e.g. naïve CD4 T cells) based on marker expression values can be time-consuming, subjective, and error-prone. In this work we propose an algorithm that addresses these problems through the creation of an automated tool, CytoPheno, that assigns marker definitions and cell type names to unidentified clusters. First, post-clustered expression data undergoes per-marker calculations to assign markers as positive or negative. Next, marker names undergo a standardization process to match to Protein Ontology identifier terms. Finally, marker descriptions are matched to cell type names within the Cell Ontology. Each part of the tool was tested with benchmark data to demonstrate performance. Additionally, the tool is encompassed in a graphical user interface (R Shiny) to increase user accessibility and interpretability. Overall, CytoPheno can aid researchers in timely and unbiased phenotyping of post-clustered cytometry data.
细胞计数技术的进步使得常规测量的细胞标志物数量有所增加。由此产生的数据复杂性促使分析方法从手动转向自动化。目前,有许多无监督方法可用于根据标志物表达值对细胞进行聚类。然而,对所得聚类进行表型分析通常不是自动化过程的一部分。基于标志物表达值手动识别标志物定义(例如CD4、CCR7、CD45RA、CD19)和描述性细胞类型名称(例如初始CD4 T细胞)可能既耗时、主观又容易出错。在这项工作中,我们提出了一种算法,通过创建一个自动化工具CytoPheno来解决这些问题,该工具可将标志物定义和细胞类型名称分配给未识别的聚类。首先,对聚类后的表达数据进行逐个标志物计算,以将标志物指定为阳性或阴性。接下来,对标志物名称进行标准化处理,以匹配蛋白质本体标识符术语。最后,将标志物描述与细胞本体中的细胞类型名称进行匹配。该工具的每个部分都使用基准数据进行了测试,以证明其性能。此外,该工具包含在一个图形用户界面(R Shiny)中,以提高用户的可及性和可解释性。总体而言,CytoPheno可以帮助研究人员及时、无偏差地对聚类后的细胞计数数据进行表型分析。