Suppr超能文献

用于微生物组数据分析的核惩罚回归

KERNEL-PENALIZED REGRESSION FOR ANALYSIS OF MICROBIOME DATA.

作者信息

Randolph Timothy W, Zhao Sen, Copeland Wade, Hullar Meredith, Shojaie Ali

机构信息

Fred Hutchinson Cancer Research Center.

University of Washington.

出版信息

Ann Appl Stat. 2018 Mar;12(1):540-566. doi: 10.1214/17-AOAS1102. Epub 2018 Mar 9.

Abstract

The analysis of human microbiome data is often based on dimension-reduced graphical displays and clusterings derived from vectors of microbial abundances in each sample. Common to these ordination methods is the use of biologically motivated definitions of similarity. Principal coordinate analysis, in particular, is often performed using ecologically defined distances, allowing analyses to incorporate context-dependent, non-Euclidean structure. In this paper, we go beyond dimension-reduced ordination methods and describe a framework of high-dimensional regression models that extends these distance-based methods. In particular, we use kernel-based methods to show how to incorporate a variety of extrinsic information, such as phylogeny, into penalized regression models that estimate taxonspecific associations with a phenotype or clinical outcome. Further, we show how this regression framework can be used to address the compositional nature of multivariate predictors comprised of relative abundances; that is, vectors whose entries sum to a constant. We illustrate this approach with several simulations using data from two recent studies on gut and vaginal microbiomes. We conclude with an application to our own data, where we also incorporate a significance test for the estimated coefficients that represent associations between microbial abundance and a percent fat.

摘要

人类微生物组数据的分析通常基于降维图形显示和从每个样本中微生物丰度向量得出的聚类。这些排序方法的共同之处在于使用基于生物学动机的相似性定义。特别是主坐标分析,通常使用生态学定义的距离来进行,从而使分析能够纳入依赖于上下文的非欧几里得结构。在本文中,我们超越了降维排序方法,描述了一个高维回归模型框架,该框架扩展了这些基于距离的方法。具体而言,我们使用基于核的方法来展示如何将各种外部信息(如系统发育)纳入惩罚回归模型,这些模型估计与表型或临床结果的分类群特异性关联。此外,我们展示了这个回归框架如何用于处理由相对丰度组成的多元预测变量的组成性质;也就是说,其元素之和为常数的向量。我们使用来自最近两项关于肠道和阴道微生物组研究的数据进行了几次模拟来说明这种方法。我们以应用于我们自己的数据作为结尾,在那里我们还对代表微生物丰度与脂肪百分比之间关联的估计系数进行了显著性检验。

相似文献

1
KERNEL-PENALIZED REGRESSION FOR ANALYSIS OF MICROBIOME DATA.
Ann Appl Stat. 2018 Mar;12(1):540-566. doi: 10.1214/17-AOAS1102. Epub 2018 Mar 9.
2
The Generalized Matrix Decomposition Biplot and Its Application to Microbiome Data.
mSystems. 2019 Dec 17;4(6):e00504-19. doi: 10.1128/mSystems.00504-19.
3
Sufficient dimension reduction for compositional data.
Biostatistics. 2021 Oct 13;22(4):687-705. doi: 10.1093/biostatistics/kxz060.
4
Transformation and differential abundance analysis of microbiome data incorporating phylogeny.
Bioinformatics. 2021 Dec 11;37(24):4652-4660. doi: 10.1093/bioinformatics/btab543.
5
coda4microbiome: compositional data analysis for microbiome cross-sectional and longitudinal studies.
BMC Bioinformatics. 2023 Mar 6;24(1):82. doi: 10.1186/s12859-023-05205-3.
6
A small-sample multivariate kernel machine test for microbiome association studies.
Genet Epidemiol. 2017 Apr;41(3):210-220. doi: 10.1002/gepi.22030. Epub 2016 Dec 26.
10
A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies.
Front Genet. 2019 May 16;10:458. doi: 10.3389/fgene.2019.00458. eCollection 2019.

引用本文的文献

1
Bayesian compositional generalized linear mixed models for disease prediction using microbiome data.
BMC Bioinformatics. 2025 Apr 5;26(1):98. doi: 10.1186/s12859-025-06114-3.
3
GENERALIZED MATRIX DECOMPOSITION REGRESSION: ESTIMATION AND INFERENCE FOR TWO-WAY STRUCTURED DATA.
Ann Appl Stat. 2023 Dec;17(4):2944-2969. doi: 10.1214/23-aoas1746. Epub 2023 Oct 30.
4
Supervised learning and model analysis with compositional data.
PLoS Comput Biol. 2023 Jun 30;19(6):e1011240. doi: 10.1371/journal.pcbi.1011240. eCollection 2023 Jun.
5
Principal Amalgamation Analysis for Microbiome Data.
Genes (Basel). 2022 Jun 24;13(7):1139. doi: 10.3390/genes13071139.
6
Analysing microbiome intervention design studies: Comparison of alternative multivariate statistical methods.
PLoS One. 2021 Nov 18;16(11):e0259973. doi: 10.1371/journal.pone.0259973. eCollection 2021.
8
Tree-aggregated predictive modeling of microbiome data.
Sci Rep. 2021 Jul 15;11(1):14505. doi: 10.1038/s41598-021-93645-3.
10
mbImpute: an accurate and robust imputation method for microbiome data.
Genome Biol. 2021 Jun 28;22(1):192. doi: 10.1186/s13059-021-02400-4.

本文引用的文献

1
Measuring multivariate association and beyond.
Stat Surv. 2016;10:132-167. doi: 10.1214/16-SS116. Epub 2016 Nov 17.
2
A significance test for graph-constrained estimation.
Biometrics. 2016 Jun;72(2):484-93. doi: 10.1111/biom.12418. Epub 2015 Sep 22.
3
Testing in Microbiome-Profiling Studies with MiRKAT, the Microbiome Regression-Based Kernel Association Test.
Am J Hum Genet. 2015 May 7;96(5):797-807. doi: 10.1016/j.ajhg.2015.04.003.
4
Sparse and compositionally robust inference of microbial ecological networks.
PLoS Comput Biol. 2015 May 7;11(5):e1004226. doi: 10.1371/journal.pcbi.1004226. eCollection 2015 May.
5
Proportionality: a valid alternative to correlation for relative data.
PLoS Comput Biol. 2015 Mar 16;11(3):e1004075. doi: 10.1371/journal.pcbi.1004075. eCollection 2015 Mar.
6
Enterolignan-producing phenotypes are associated with increased gut microbial diversity and altered composition in premenopausal women in the United States.
Cancer Epidemiol Biomarkers Prev. 2015 Mar;24(3):546-54. doi: 10.1158/1055-9965.EPI-14-0262. Epub 2014 Dec 26.
7
Conducting a microbiome study.
Cell. 2014 Jul 17;158(2):250-262. doi: 10.1016/j.cell.2014.06.037.
9
Phylogeny-based classification of microbial communities.
Bioinformatics. 2014 Feb 15;30(4):449-56. doi: 10.1093/bioinformatics/btt700. Epub 2013 Dec 24.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验