Suppr超能文献

在质谱流式细胞术数据中使用核函数进行成分数据分析。

Compositional Data Analysis using Kernels in mass cytometry data.

作者信息

Rudra Pratyaydipta, Baxter Ryan, Hsieh Elena W Y, Ghosh Debashis

机构信息

Department of Statistics, Oklahoms State University, Stillwater, OK 74078, USA.

Department of Immunology and Microbiology, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.

出版信息

Bioinform Adv. 2022 Feb 11;2(1):vbac003. doi: 10.1093/bioadv/vbac003. eCollection 2022.

Abstract

MOTIVATION

Cell-type abundance data arising from mass cytometry experiments are compositional in nature. Classical association tests do not apply to the compositional data due to their non-Euclidean nature. Existing methods for analysis of cell type abundance data suffer from several limitations for high-dimensional mass cytometry data, especially when the sample size is small.

RESULTS

We proposed a new multivariate statistical learning methodology, Compositional Data Analysis using Kernels (CODAK), based on the kernel distance covariance (KDC) framework to test the association of the cell type compositions with important predictors (categorical or continuous) such as disease status. CODAK scales well for high-dimensional data and provides satisfactory performance for small sample sizes ( < 25). We conducted simulation studies to compare the performance of the method with existing methods of analyzing cell type abundance data from mass cytometry studies. The method is also applied to a high-dimensional dataset containing different subgroups of populations including Systemic Lupus Erythematosus (SLE) patients and healthy control subjects.

AVAILABILITY AND IMPLEMENTATION

CODAK is implemented using R. The codes and the data used in this manuscript are available on the web at http://github.com/GhoshLab/CODAK/.

CONTACT

prudra@okstate.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

动机

质谱流式细胞术实验产生的细胞类型丰度数据本质上是成分数据。由于其非欧几里得性质,经典的关联检验不适用于成分数据。现有的细胞类型丰度数据分析方法在处理高维度质谱流式细胞术数据时存在若干局限性,尤其是在样本量较小时。

结果

我们基于核距离协方差(KDC)框架提出了一种新的多元统计学习方法,即使用核的成分数据分析(CODAK),以检验细胞类型组成与诸如疾病状态等重要预测因子(分类或连续)之间的关联。CODAK对于高维数据具有良好的扩展性,并且在小样本量(<25)时也能提供令人满意的性能。我们进行了模拟研究,以比较该方法与现有质谱流式细胞术研究中的细胞类型丰度数据分析方法的性能。该方法还应用于一个包含不同人群亚组的高维数据集,其中包括系统性红斑狼疮(SLE)患者和健康对照受试者。

可用性与实现

CODAK使用R语言实现。本手稿中使用的代码和数据可在网页http://github.com/GhoshLab/CODAK/上获取。

联系方式

prudra@okstate.edu

补充信息

补充数据可在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505a/9710596/5e8f7e6895f8/vbac003f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验