在质谱流式细胞术数据中使用核函数进行成分数据分析。

Compositional Data Analysis using Kernels in mass cytometry data.

作者信息

Rudra Pratyaydipta, Baxter Ryan, Hsieh Elena W Y, Ghosh Debashis

机构信息

Department of Statistics, Oklahoms State University, Stillwater, OK 74078, USA.

Department of Immunology and Microbiology, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.

出版信息

Bioinform Adv. 2022 Feb 11;2(1):vbac003. doi: 10.1093/bioadv/vbac003. eCollection 2022.

DOI:10.1093/bioadv/vbac003

PMID:35224501

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8867823/

Abstract

MOTIVATION

Cell-type abundance data arising from mass cytometry experiments are compositional in nature. Classical association tests do not apply to the compositional data due to their non-Euclidean nature. Existing methods for analysis of cell type abundance data suffer from several limitations for high-dimensional mass cytometry data, especially when the sample size is small.

RESULTS

We proposed a new multivariate statistical learning methodology, Compositional Data Analysis using Kernels (CODAK), based on the kernel distance covariance (KDC) framework to test the association of the cell type compositions with important predictors (categorical or continuous) such as disease status. CODAK scales well for high-dimensional data and provides satisfactory performance for small sample sizes ( < 25). We conducted simulation studies to compare the performance of the method with existing methods of analyzing cell type abundance data from mass cytometry studies. The method is also applied to a high-dimensional dataset containing different subgroups of populations including Systemic Lupus Erythematosus (SLE) patients and healthy control subjects.

AVAILABILITY AND IMPLEMENTATION

CODAK is implemented using R. The codes and the data used in this manuscript are available on the web at http://github.com/GhoshLab/CODAK/.

CONTACT

prudra@okstate.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

动机

质谱流式细胞术实验产生的细胞类型丰度数据本质上是成分数据。由于其非欧几里得性质，经典的关联检验不适用于成分数据。现有的细胞类型丰度数据分析方法在处理高维度质谱流式细胞术数据时存在若干局限性，尤其是在样本量较小时。

结果

我们基于核距离协方差（KDC）框架提出了一种新的多元统计学习方法，即使用核的成分数据分析（CODAK），以检验细胞类型组成与诸如疾病状态等重要预测因子（分类或连续）之间的关联。CODAK对于高维数据具有良好的扩展性，并且在小样本量（<25）时也能提供令人满意的性能。我们进行了模拟研究，以比较该方法与现有质谱流式细胞术研究中的细胞类型丰度数据分析方法的性能。该方法还应用于一个包含不同人群亚组的高维数据集，其中包括系统性红斑狼疮（SLE）患者和健康对照受试者。

可用性与实现

CODAK使用R语言实现。本手稿中使用的代码和数据可在网页http://github.com/GhoshLab/CODAK/上获取。

联系方式

prudra@okstate.edu。

补充信息

补充数据可在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/505a/9710596/5e8f7e6895f8/vbac003f1.jpg

相似文献

Compositional Data Analysis using Kernels in mass cytometry data.在质谱流式细胞术数据中使用核函数进行成分数据分析。

Bioinform Adv. 2022 Feb 11;2(1):vbac003. doi: 10.1093/bioadv/vbac003. eCollection 2022.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Gating mass cytometry data by deep learning.通过深度学习对门控质谱流式细胞术数据进行分类。

Bioinformatics. 2017 Nov 1;33(21):3423-3430. doi: 10.1093/bioinformatics/btx448.

Compositional analysis of microbiome data using the linear decomposition model (LDM).使用线性分解模型（LDM）对微生物组数据进行成分分析。

bioRxiv. 2023 May 29:2023.05.26.542540. doi: 10.1101/2023.05.26.542540.

Investigating microbial co-occurrence patterns based on metagenomic compositional data.基于宏基因组组成数据研究微生物共生模式。

Bioinformatics. 2015 Oct 15;31(20):3322-9. doi: 10.1093/bioinformatics/btv364. Epub 2015 Jun 16.

SCANCell reveals diverse inter-cluster interaction patterns in systemic lupus erythematosus across the disease spectrum.SCANCell 揭示了系统性红斑狼疮在疾病谱中不同簇间的相互作用模式。

Bioinformatics. 2022 Feb 7;38(5):1361-1368. doi: 10.1093/bioinformatics/btab713.

Transformation and differential abundance analysis of microbiome data incorporating phylogeny.整合系统发育信息的微生物组数据的转化和差异丰度分析。

Bioinformatics. 2021 Dec 11;37(24):4652-4660. doi: 10.1093/bioinformatics/btab543.

CyTOFmerge: integrating mass cytometry data across multiple panels.CyTOFmerge：跨多个面板整合液质联用数据。

Bioinformatics. 2019 Oct 15;35(20):4063-4071. doi: 10.1093/bioinformatics/btz180.

Sparse least trimmed squares regression with compositional covariates for high-dimensional data.基于成分协变量的高维数据稀疏最小 trimmed 方回归。

Bioinformatics. 2021 Nov 5;37(21):3805-3814. doi: 10.1093/bioinformatics/btab572.

CYBERTRACK2.0: zero-inflated model-based cell clustering and population tracking method for longitudinal mass cytometry data.CYBERTRACK2.0：基于零膨胀模型的细胞聚类和群体跟踪方法，用于纵向质谱流式细胞术数据。

Bioinformatics. 2021 Jul 12;37(11):1632-1634. doi: 10.1093/bioinformatics/btaa873.

引用本文的文献

Expansion of extrafollicular B and T cell subsets in childhood-onset systemic lupus erythematosus.儿童发病系统性红斑狼疮中外周滤泡 B 和 T 细胞亚群的扩增。

Front Immunol. 2023 Oct 27;14:1208282. doi: 10.3389/fimmu.2023.1208282. eCollection 2023.

本文引用的文献

Optimal Estimation of Wasserstein Distance on A Tree with An Application to Microbiome Studies.树上瓦瑟斯坦距离的最优估计及其在微生物组研究中的应用

J Am Stat Assoc. 2021;116(535):1237-1253. doi: 10.1080/01621459.2019.1699422. Epub 2020 Jan 23.

Seroconversion stages COVID19 into distinct pathophysiological states.血清转化将 COVID19 分为不同的病理生理状态。

Elife. 2021 Mar 16;10:e65508. doi: 10.7554/eLife.65508.

MCMSeq: Bayesian hierarchical modeling of clustered and repeated measures RNA sequencing experiments.MCMSeq：用于聚类和重复测量 RNA 测序实验的贝叶斯层次模型。

BMC Bioinformatics. 2020 Aug 28;21(1):375. doi: 10.1186/s12859-020-03715-y.

Mapping systemic lupus erythematosus heterogeneity at the single-cell level.在单细胞水平上绘制系统性红斑狼疮异质性图谱。

Nat Immunol. 2020 Sep;21(9):1094-1106. doi: 10.1038/s41590-020-0743-0. Epub 2020 Aug 3.

Genome-wide association studies of brain imaging data via weighted distance correlation.基于加权距离相关的脑影像数据全基因组关联研究。

Bioinformatics. 2020 Dec 8;36(19):4942-4950. doi: 10.1093/bioinformatics/btaa612.

Mass Cytometry Reveals Global Immune Remodeling with Multi-lineage Hypersensitivity to Type I Interferon in Down Syndrome.质谱细胞术揭示唐氏综合征中 I 型干扰素的多谱系高敏感性导致的全身免疫重塑。

Cell Rep. 2019 Nov 12;29(7):1893-1908.e4. doi: 10.1016/j.celrep.2019.10.038.

diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering.diffcyt：通过高分辨率聚类进行高维流式细胞术的差异发现。

Commun Biol. 2019 May 14;2:183. doi: 10.1038/s42003-019-0415-5. eCollection 2019.

Testing cross-phenotype effects of rare variants in longitudinal studies of complex traits.在复杂性状的纵向研究中测试罕见变异的跨表型效应。

Genet Epidemiol. 2018 Jun;42(4):320-332. doi: 10.1002/gepi.22121. Epub 2018 Mar 30.

Microbiome Datasets Are Compositional: And This Is Not Optional.微生物组数据集具有构成性：这并非可有可无。

Front Microbiol. 2017 Nov 15;8:2224. doi: 10.3389/fmicb.2017.02224. eCollection 2017.

A broken promise: microbiome differential abundance methods do not control the false discovery rate.违背诺言：微生物组差异丰度方法无法控制假发现率。

Brief Bioinform. 2019 Jan 18;20(1):210-221. doi: 10.1093/bib/bbx104.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在质谱流式细胞术数据中使用核函数进行成分数据分析。

Compositional Data Analysis using Kernels in mass cytometry data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性与实现

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献