用于微生物组数据分析的核惩罚回归

KERNEL-PENALIZED REGRESSION FOR ANALYSIS OF MICROBIOME DATA.

作者信息

Randolph Timothy W, Zhao Sen, Copeland Wade, Hullar Meredith, Shojaie Ali

机构信息

Fred Hutchinson Cancer Research Center.

University of Washington.

出版信息

Ann Appl Stat. 2018 Mar;12(1):540-566. doi: 10.1214/17-AOAS1102. Epub 2018 Mar 9.

DOI:10.1214/17-AOAS1102

PMID:30224943

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6138053/

Abstract

The analysis of human microbiome data is often based on dimension-reduced graphical displays and clusterings derived from vectors of microbial abundances in each sample. Common to these ordination methods is the use of biologically motivated definitions of similarity. Principal coordinate analysis, in particular, is often performed using ecologically defined distances, allowing analyses to incorporate context-dependent, non-Euclidean structure. In this paper, we go beyond dimension-reduced ordination methods and describe a framework of high-dimensional regression models that extends these distance-based methods. In particular, we use kernel-based methods to show how to incorporate a variety of extrinsic information, such as phylogeny, into penalized regression models that estimate taxonspecific associations with a phenotype or clinical outcome. Further, we show how this regression framework can be used to address the compositional nature of multivariate predictors comprised of relative abundances; that is, vectors whose entries sum to a constant. We illustrate this approach with several simulations using data from two recent studies on gut and vaginal microbiomes. We conclude with an application to our own data, where we also incorporate a significance test for the estimated coefficients that represent associations between microbial abundance and a percent fat.

摘要

人类微生物组数据的分析通常基于降维图形显示和从每个样本中微生物丰度向量得出的聚类。这些排序方法的共同之处在于使用基于生物学动机的相似性定义。特别是主坐标分析，通常使用生态学定义的距离来进行，从而使分析能够纳入依赖于上下文的非欧几里得结构。在本文中，我们超越了降维排序方法，描述了一个高维回归模型框架，该框架扩展了这些基于距离的方法。具体而言，我们使用基于核的方法来展示如何将各种外部信息（如系统发育）纳入惩罚回归模型，这些模型估计与表型或临床结果的分类群特异性关联。此外，我们展示了这个回归框架如何用于处理由相对丰度组成的多元预测变量的组成性质；也就是说，其元素之和为常数的向量。我们使用来自最近两项关于肠道和阴道微生物组研究的数据进行了几次模拟来说明这种方法。我们以应用于我们自己的数据作为结尾，在那里我们还对代表微生物丰度与脂肪百分比之间关联的估计系数进行了显著性检验。

相似文献

KERNEL-PENALIZED REGRESSION FOR ANALYSIS OF MICROBIOME DATA.用于微生物组数据分析的核惩罚回归

Ann Appl Stat. 2018 Mar;12(1):540-566. doi: 10.1214/17-AOAS1102. Epub 2018 Mar 9.

The Generalized Matrix Decomposition Biplot and Its Application to Microbiome Data.广义矩阵分解双标图及其在微生物组数据中的应用

mSystems. 2019 Dec 17;4(6):e00504-19. doi: 10.1128/mSystems.00504-19.

Sufficient dimension reduction for compositional data.充分降维处理组合数据。

Biostatistics. 2021 Oct 13;22(4):687-705. doi: 10.1093/biostatistics/kxz060.

Transformation and differential abundance analysis of microbiome data incorporating phylogeny.整合系统发育信息的微生物组数据的转化和差异丰度分析。

Bioinformatics. 2021 Dec 11;37(24):4652-4660. doi: 10.1093/bioinformatics/btab543.

coda4microbiome: compositional data analysis for microbiome cross-sectional and longitudinal studies.coda4microbiome：微生物组横断面和纵向研究的组成数据分析。

BMC Bioinformatics. 2023 Mar 6;24(1):82. doi: 10.1186/s12859-023-05205-3.

A small-sample multivariate kernel machine test for microbiome association studies.用于微生物组关联研究的小样本多变量核机器测试。

Genet Epidemiol. 2017 Apr;41(3):210-220. doi: 10.1002/gepi.22030. Epub 2016 Dec 26.

A powerful microbiome-based association test and a microbial taxa discovery framework for comprehensive association mapping.基于宏基因组关联测试和微生物分类群发现框架的全面关联图谱分析。

Microbiome. 2017 Apr 24;5(1):45. doi: 10.1186/s40168-017-0262-x.

MKMR: a multi-kernel machine regression model to predict health outcomes using human microbiome data.MKMR：一种使用人类微生物组数据预测健康结果的多核机回归模型。

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad158.

Principal microbial groups: compositional alternative to phylogenetic grouping of microbiome data.主要微生物群：微生物组数据的组成替代分类群。

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac328.

A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies.基于广义线性混合模型的距离核关联检验在相关微生物组研究中的应用

Front Genet. 2019 May 16;10:458. doi: 10.3389/fgene.2019.00458. eCollection 2019.

引用本文的文献

Bayesian compositional generalized linear mixed models for disease prediction using microbiome data.使用微生物组数据进行疾病预测的贝叶斯成分广义线性混合模型

BMC Bioinformatics. 2025 Apr 5;26(1):98. doi: 10.1186/s12859-025-06114-3.

Proportion-based normalizations outperform compositional data transformations in machine learning applications.基于比例的归一化在机器学习应用中优于成分数据变换。

Microbiome. 2024 Mar 5;12(1):45. doi: 10.1186/s40168-023-01747-z.

GENERALIZED MATRIX DECOMPOSITION REGRESSION: ESTIMATION AND INFERENCE FOR TWO-WAY STRUCTURED DATA.广义矩阵分解回归：双向结构化数据的估计与推断

Ann Appl Stat. 2023 Dec;17(4):2944-2969. doi: 10.1214/23-aoas1746. Epub 2023 Oct 30.

Supervised learning and model analysis with compositional data.基于组合数据的监督学习和模型分析。

PLoS Comput Biol. 2023 Jun 30;19(6):e1011240. doi: 10.1371/journal.pcbi.1011240. eCollection 2023 Jun.

Principal Amalgamation Analysis for Microbiome Data.微生物组数据的主成分融合分析。

Genes (Basel). 2022 Jun 24;13(7):1139. doi: 10.3390/genes13071139.

Analysing microbiome intervention design studies: Comparison of alternative multivariate statistical methods.分析微生物组干预设计研究：替代多元统计方法的比较。

PLoS One. 2021 Nov 18;16(11):e0259973. doi: 10.1371/journal.pone.0259973. eCollection 2021.

Microbial trend analysis for common dynamic trend, group comparison, and classification in longitudinal microbiome study.纵向微生物组研究中常见动态趋势、组间比较和分类的微生物趋势分析。

BMC Genomics. 2021 Sep 15;22(1):667. doi: 10.1186/s12864-021-07948-w.

Tree-aggregated predictive modeling of microbiome data.基于树的微生物组数据预测模型构建。

Sci Rep. 2021 Jul 15;11(1):14505. doi: 10.1038/s41598-021-93645-3.

Feature selection and causal analysis for microbiome studies in the presence of confounding using standardization.基于标准化的混杂因素校正方法在微生物组学研究中的特征选择和因果分析

BMC Bioinformatics. 2021 Jul 6;22(1):362. doi: 10.1186/s12859-021-04232-2.

mbImpute: an accurate and robust imputation method for microbiome data.mbImpute：一种准确且稳健的微生物组数据插补方法。

Genome Biol. 2021 Jun 28;22(1):192. doi: 10.1186/s13059-021-02400-4.

本文引用的文献

Measuring multivariate association and beyond.测量多元关联及其他。

Stat Surv. 2016;10:132-167. doi: 10.1214/16-SS116. Epub 2016 Nov 17.

A significance test for graph-constrained estimation.一种用于图形约束估计的显著性检验。

Biometrics. 2016 Jun;72(2):484-93. doi: 10.1111/biom.12418. Epub 2015 Sep 22.

Testing in Microbiome-Profiling Studies with MiRKAT, the Microbiome Regression-Based Kernel Association Test.使用MiRKAT（基于微生物组回归的核关联测试）进行微生物组分析研究中的测试。

Am J Hum Genet. 2015 May 7;96(5):797-807. doi: 10.1016/j.ajhg.2015.04.003.

Sparse and compositionally robust inference of microbial ecological networks.微生物生态网络的稀疏且成分稳健推断

PLoS Comput Biol. 2015 May 7;11(5):e1004226. doi: 10.1371/journal.pcbi.1004226. eCollection 2015 May.

Proportionality: a valid alternative to correlation for relative data.比例性：相对数据相关性的有效替代方法。

PLoS Comput Biol. 2015 Mar 16;11(3):e1004075. doi: 10.1371/journal.pcbi.1004075. eCollection 2015 Mar.

Enterolignan-producing phenotypes are associated with increased gut microbial diversity and altered composition in premenopausal women in the United States.在美国，产生肠木脂素的表型与绝经前女性肠道微生物多样性增加和组成改变有关。

Cancer Epidemiol Biomarkers Prev. 2015 Mar;24(3):546-54. doi: 10.1158/1055-9965.EPI-14-0262. Epub 2014 Dec 26.

Conducting a microbiome study.进行微生物组研究。

Cell. 2014 Jul 17;158(2):250-262. doi: 10.1016/j.cell.2014.06.037.

A network-based kernel machine test for the identification of risk pathways in genome-wide association studies.一种基于网络的核机器测试，用于在全基因组关联研究中识别风险通路。

Hum Hered. 2013;76(2):64-75. doi: 10.1159/000357567. Epub 2014 Jan 14.

Phylogeny-based classification of microbial communities.基于系统发育的微生物群落分类。

Bioinformatics. 2014 Feb 15;30(4):449-56. doi: 10.1093/bioinformatics/btt700. Epub 2013 Dec 24.

Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison.边缘主成分分析和挤压聚类：利用系统发育定位数据的特殊结构进行样本比较。

PLoS One. 2013;8(3):e56859. doi: 10.1371/journal.pone.0056859. Epub 2013 Mar 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。