Coenders Germà, Greenacre Michael
Department of Economics, Universitat de Girona, Girona, Spain.
Department of Economics and Business and Barcelona School of Management, Universitat Pompeu Fabra, Barcelona, Spain.
J Appl Stat. 2022 Aug 6;50(16):3272-3293. doi: 10.1080/02664763.2022.2108007. eCollection 2023.
Logratios between pairs of compositional parts (pairwise logratios) are the easiest to interpret in compositional data analysis, and include the well-known additive logratios as particular cases. When the number of parts is large (sometimes even larger than the number of cases), some form of logratio selection is needed. In this article, we present three alternative stepwise supervised learning methods to select the pairwise logratios that best explain a dependent variable in a generalized linear model, each geared for a specific problem. The first method features unrestricted search, where any pairwise logratio can be selected. This method has a complex interpretation if some pairs of parts in the logratios overlap, but it leads to the most accurate predictions. The second method restricts parts to occur only once, which makes the corresponding logratios intuitively interpretable. The third method uses additive logratios, so that -1 selected logratios involve a -part subcomposition. Our approach allows logratios or non-compositional covariates to be forced into the models based on theoretical knowledge, and various stopping criteria are available based on information measures or statistical significance with the Bonferroni correction. We present an application on a dataset from a study predicting Crohn's disease.
在成分数据分析中,组成部分对之间的对数比率(成对对数比率)最易于解释,并且包括作为特殊情况的著名的加法对数比率。当部分的数量很大时(有时甚至大于案例的数量),就需要某种形式的对数比率选择。在本文中,我们提出了三种替代的逐步监督学习方法,以选择能在广义线性模型中最佳解释因变量的成对对数比率,每种方法都针对特定问题。第一种方法的特点是无限制搜索,即可以选择任何成对对数比率。如果对数比率中的某些部分对重叠,这种方法的解释会很复杂,但它能带来最准确的预测。第二种方法限制部分只出现一次,这使得相应的对数比率在直观上易于解释。第三种方法使用加法对数比率,这样选择的(r)个对数比率涉及一个(r)部分子成分。我们的方法允许基于理论知识将对数比率或非成分协变量强制纳入模型,并且基于信息度量或经邦费罗尼校正的统计显著性有各种停止标准。我们展示了对一个预测克罗恩病的研究数据集的应用。