Warton David I, Thibaut Loïc, Wang Yi Alice
School of Mathematics and Statistics and the Evolution & Ecology Research Centre, UNSW Sydney, NSW, Australia.
School of Mathematics and Statistics, UNSW Sydney, NSW, Australia.
PLoS One. 2017 Jul 24;12(7):e0181790. doi: 10.1371/journal.pone.0181790. eCollection 2017.
Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of "model-free bootstrap", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.
自助法在统计学中被广泛使用,残差的自助法在回归背景下可能特别有用。然而,将残差重采样扩展到残差分布不相同(因此不适合自助法)的回归设置中会遇到困难——常见的例子包括逻辑回归或泊松回归以及处理聚类或多变量数据的推广方法,如广义估计方程。我们提出了一种基于概率积分变换(PIT -)残差的自助法,我们称之为PIT - 陷阱法,该方法假设数据来自已知参数形式的某种边际分布F。这种方法可以理解为一种“无模型自助法”,适用于离散和高度多变量数据的问题。PIT - 残差具有关键性质,即它们(渐近地)是枢轴量。因此,PIT - 陷阱法继承了这一关键性质,这是任何其他残差重采样方法都不具备的,即在PIT - 陷阱法下数据的边际分布可以得到保留。这进而使得能够推导一些标准的自助法性质,包括枢轴PIT - 陷阱检验统计量的二阶正确性。在多变量数据中,对PIT - 残差的行进行自助法具有在无需对其进行建模的情况下保留数据相关性的性质,这是与参数自助法相比的一个关键差异点。所提出的方法在一个涉及生态学中多变量丰度数据的例子中得到了说明,并通过模拟证明与竞争的重采样方法相比具有更好的性质。