Suppr超能文献

充分降维处理组合数据。

Sufficient dimension reduction for compositional data.

机构信息

CONICET and Facultad de Ingeniería Química, Universidad Nacional sel Litoral, Santiago del estero 2829, 3000 Santa Fe, Argentina and Institut Charles Delaunay/ROSAS Department, Systems Modelling and Dependability Team, Université de Technologie de Troyes, 12 rue Marie Curie, 10004 Troyes Cedex, France.

CONICET and Facultad de Ingeniería Química, Universidad Nacional sel Litoral, Santiago del estero 2829, 3000 Santa Fe, Argentina.

出版信息

Biostatistics. 2021 Oct 13;22(4):687-705. doi: 10.1093/biostatistics/kxz060.

Abstract

Recent efforts to characterize the human microbiome and its relation to chronic diseases have led to a surge in statistical development for compositional data. We develop likelihood-based sufficient dimension reduction methods (SDR) to find linear combinations that contain all the information in the compositional data on an outcome variable, i.e., are sufficient for modeling and prediction of the outcome. We consider several models for the inverse regression of the compositional vector or transformations of it, as a function of outcome. They include normal, multinomial, and Poisson graphical models that allow for complex dependencies among observed counts. These methods yield efficient estimators of the reduction and can be applied to continuous or categorical outcomes. We incorporate variable selection into the estimation via penalties and address important invariance issues arising from the compositional nature of the data. We illustrate and compare our methods and some established methods for analyzing microbiome data in simulations and using data from the Human Microbiome Project. Displaying the data in the coordinate system of the SDR linear combinations allows visual inspection and facilitates comparisons across studies.

摘要

近年来,人们致力于描述人类微生物组及其与慢性病的关系,这推动了用于处理成分数据的统计方法的发展。我们开发了基于似然的充分降维方法 (SDR),以找到包含成分数据中有关因变量的所有信息的线性组合,即对因变量进行建模和预测是充分的。我们考虑了几种模型,将成分向量或其变换作为因变量的函数进行逆回归。这些模型包括正态、多项和泊松图形模型,它们允许观察到的计数之间存在复杂的依赖性。这些方法可得到降维的有效估计量,并且可应用于连续或分类的因变量。我们通过惩罚项将变量选择纳入到估计中,并解决了数据的成分性质所引起的重要不变性问题。我们在模拟中并使用人类微生物组计划的数据来展示和比较我们的方法和一些用于分析微生物组数据的已有方法。在 SDR 线性组合的坐标系中显示数据可进行直观检查,并便于跨研究进行比较。

相似文献

1
Sufficient dimension reduction for compositional data.
Biostatistics. 2021 Oct 13;22(4):687-705. doi: 10.1093/biostatistics/kxz060.
2
Sufficient dimension reduction for censored predictors.
Biometrics. 2017 Mar;73(1):220-231. doi: 10.1111/biom.12556. Epub 2016 Aug 9.
3
Generalized linear models with linear constraints for microbiome compositional data.
Biometrics. 2019 Mar;75(1):235-244. doi: 10.1111/biom.12956. Epub 2018 Aug 10.
4
A logistic normal multinomial regression model for microbiome compositional data analysis.
Biometrics. 2013 Dec;69(4):1053-63. doi: 10.1111/biom.12079. Epub 2013 Oct 15.
5
Prediction analysis for microbiome sequencing data.
Biometrics. 2019 Sep;75(3):875-884. doi: 10.1111/biom.13061. Epub 2019 Apr 17.
6
KERNEL-PENALIZED REGRESSION FOR ANALYSIS OF MICROBIOME DATA.
Ann Appl Stat. 2018 Mar;12(1):540-566. doi: 10.1214/17-AOAS1102. Epub 2018 Mar 9.
7
coda4microbiome: compositional data analysis for microbiome cross-sectional and longitudinal studies.
BMC Bioinformatics. 2023 Mar 6;24(1):82. doi: 10.1186/s12859-023-05205-3.
8
Bayesian compositional regression with structured priors for microbiome feature selection.
Biometrics. 2021 Sep;77(3):824-838. doi: 10.1111/biom.13335. Epub 2020 Jul 31.
9
gCoda: Conditional Dependence Network Inference for Compositional Data.
J Comput Biol. 2017 Jul;24(7):699-708. doi: 10.1089/cmb.2017.0054. Epub 2017 May 10.
10
Direct interaction network and differential network inference from compositional data via lasso penalized D-trace loss.
PLoS One. 2019 Jul 24;14(7):e0207731. doi: 10.1371/journal.pone.0207731. eCollection 2019.

引用本文的文献

1
Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets.
Front Bioinform. 2023 Aug 10;3:1211819. doi: 10.3389/fbinf.2023.1211819. eCollection 2023.

本文引用的文献

1
Prediction analysis for microbiome sequencing data.
Biometrics. 2019 Sep;75(3):875-884. doi: 10.1111/biom.13061. Epub 2019 Apr 17.
2
A GLM-based latent variable ordination method for microbiome samples.
Biometrics. 2018 Jun;74(2):448-457. doi: 10.1111/biom.12775. Epub 2017 Oct 9.
3
A Review of Multivariate Distributions for Count Data Derived from the Poisson Distribution.
Wiley Interdiscip Rev Comput Stat. 2017 May-Jun;9(3). doi: 10.1002/wics.1398. Epub 2017 Mar 28.
4
An adaptive association test for microbiome data.
Genome Med. 2016 May 19;8(1):56. doi: 10.1186/s13073-016-0302-3.
5
Small Sample Kernel Association Tests for Human Genetic and Microbiome Association Studies.
Genet Epidemiol. 2016 Jan;40(1):5-19. doi: 10.1002/gepi.21934. Epub 2015 Dec 7.
6
Testing in Microbiome-Profiling Studies with MiRKAT, the Microbiome Regression-Based Kernel Association Test.
Am J Hum Genet. 2015 May 7;96(5):797-807. doi: 10.1016/j.ajhg.2015.04.003.
8
A logistic normal multinomial regression model for microbiome compositional data analysis.
Biometrics. 2013 Dec;69(4):1053-63. doi: 10.1111/biom.12079. Epub 2013 Oct 15.
9
Chapter 12: Human microbiome analysis.
PLoS Comput Biol. 2012;8(12):e1002808. doi: 10.1371/journal.pcbi.1002808. Epub 2012 Dec 27.
10
Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis.
Biostatistics. 2013 Apr;14(2):244-58. doi: 10.1093/biostatistics/kxs038. Epub 2012 Oct 15.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验