Rudra Pratyaydipta, Baxter Ryan, Hsieh Elena W Y, Ghosh Debashis
Department of Statistics, Oklahoms State University, Stillwater, OK 74078, USA.
Department of Immunology and Microbiology, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.
Bioinform Adv. 2022 Feb 11;2(1):vbac003. doi: 10.1093/bioadv/vbac003. eCollection 2022.
Cell-type abundance data arising from mass cytometry experiments are compositional in nature. Classical association tests do not apply to the compositional data due to their non-Euclidean nature. Existing methods for analysis of cell type abundance data suffer from several limitations for high-dimensional mass cytometry data, especially when the sample size is small.
We proposed a new multivariate statistical learning methodology, Compositional Data Analysis using Kernels (CODAK), based on the kernel distance covariance (KDC) framework to test the association of the cell type compositions with important predictors (categorical or continuous) such as disease status. CODAK scales well for high-dimensional data and provides satisfactory performance for small sample sizes ( < 25). We conducted simulation studies to compare the performance of the method with existing methods of analyzing cell type abundance data from mass cytometry studies. The method is also applied to a high-dimensional dataset containing different subgroups of populations including Systemic Lupus Erythematosus (SLE) patients and healthy control subjects.
CODAK is implemented using R. The codes and the data used in this manuscript are available on the web at http://github.com/GhoshLab/CODAK/.
Supplementary data are available at online.
质谱流式细胞术实验产生的细胞类型丰度数据本质上是成分数据。由于其非欧几里得性质,经典的关联检验不适用于成分数据。现有的细胞类型丰度数据分析方法在处理高维度质谱流式细胞术数据时存在若干局限性,尤其是在样本量较小时。
我们基于核距离协方差(KDC)框架提出了一种新的多元统计学习方法,即使用核的成分数据分析(CODAK),以检验细胞类型组成与诸如疾病状态等重要预测因子(分类或连续)之间的关联。CODAK对于高维数据具有良好的扩展性,并且在小样本量(<25)时也能提供令人满意的性能。我们进行了模拟研究,以比较该方法与现有质谱流式细胞术研究中的细胞类型丰度数据分析方法的性能。该方法还应用于一个包含不同人群亚组的高维数据集,其中包括系统性红斑狼疮(SLE)患者和健康对照受试者。
CODAK使用R语言实现。本手稿中使用的代码和数据可在网页http://github.com/GhoshLab/CODAK/上获取。
补充数据可在线获取。