CONICET and Facultad de Ingeniería Química, Universidad Nacional sel Litoral, Santiago del estero 2829, 3000 Santa Fe, Argentina and Institut Charles Delaunay/ROSAS Department, Systems Modelling and Dependability Team, Université de Technologie de Troyes, 12 rue Marie Curie, 10004 Troyes Cedex, France.
CONICET and Facultad de Ingeniería Química, Universidad Nacional sel Litoral, Santiago del estero 2829, 3000 Santa Fe, Argentina.
Biostatistics. 2021 Oct 13;22(4):687-705. doi: 10.1093/biostatistics/kxz060.
Recent efforts to characterize the human microbiome and its relation to chronic diseases have led to a surge in statistical development for compositional data. We develop likelihood-based sufficient dimension reduction methods (SDR) to find linear combinations that contain all the information in the compositional data on an outcome variable, i.e., are sufficient for modeling and prediction of the outcome. We consider several models for the inverse regression of the compositional vector or transformations of it, as a function of outcome. They include normal, multinomial, and Poisson graphical models that allow for complex dependencies among observed counts. These methods yield efficient estimators of the reduction and can be applied to continuous or categorical outcomes. We incorporate variable selection into the estimation via penalties and address important invariance issues arising from the compositional nature of the data. We illustrate and compare our methods and some established methods for analyzing microbiome data in simulations and using data from the Human Microbiome Project. Displaying the data in the coordinate system of the SDR linear combinations allows visual inspection and facilitates comparisons across studies.
近年来,人们致力于描述人类微生物组及其与慢性病的关系,这推动了用于处理成分数据的统计方法的发展。我们开发了基于似然的充分降维方法 (SDR),以找到包含成分数据中有关因变量的所有信息的线性组合,即对因变量进行建模和预测是充分的。我们考虑了几种模型,将成分向量或其变换作为因变量的函数进行逆回归。这些模型包括正态、多项和泊松图形模型,它们允许观察到的计数之间存在复杂的依赖性。这些方法可得到降维的有效估计量,并且可应用于连续或分类的因变量。我们通过惩罚项将变量选择纳入到估计中,并解决了数据的成分性质所引起的重要不变性问题。我们在模拟中并使用人类微生物组计划的数据来展示和比较我们的方法和一些用于分析微生物组数据的已有方法。在 SDR 线性组合的坐标系中显示数据可进行直观检查,并便于跨研究进行比较。