Follmann Dean, Proschan Michael, Leifer Eric
National Institute of Allergy and Infectious Diseases, 6700B Rockledge Drive MSC 7609, Bethesda, Maryland 20892, USA.
Biometrics. 2003 Jun;59(2):420-9. doi: 10.1111/1541-0420.00049.
This article applies a simple method for settings where one has clustered data, but statistical methods are only available for independent data. We assume the statistical method provides us with a normally distributed estimate, theta, and an estimate of its variance sigma. We randomly select a data point from each cluster and apply our statistical method to this independent data. We repeat this multiple times, and use the average of the associated theta's as our estimate. An estimate of the variance is given by the average of the sigma2's minus the sample variance of the theta's. We call this procedure multiple outputation, as all "excess" data within each cluster is thrown out multiple times. Hoffman, Sen, and Weinberg (2001, Biometrika 88, 1121-1134) introduced this approach for generalized linear models when the cluster size is related to outcome. In this article, we demonstrate the broad applicability of the approach. Applications to angular data, p-values, vector parameters, Bayesian inference, genetics data, and random cluster sizes are discussed. In addition, asymptotic normality of estimates based on all possible outputations, as well as a finite number of outputations, is proven given weak conditions. Multiple outputation provides a simple and broadly applicable method for analyzing clustered data. It is especially suited to settings where methods for clustered data are impractical, but can also be applied generally as a quick and simple tool.
本文应用了一种简单的方法,适用于存在聚类数据但统计方法仅适用于独立数据的情况。我们假设统计方法为我们提供了一个正态分布的估计值θ及其方差σ的估计值。我们从每个聚类中随机选择一个数据点,并将我们的统计方法应用于这些独立数据。我们重复此操作多次,并使用相关θ值的平均值作为我们的估计值。方差的估计值由σ²的平均值减去θ值的样本方差给出。我们将此过程称为多次输出法,因为每个聚类中的所有“多余”数据都被多次舍弃。霍夫曼、森和温伯格(2001年,《生物统计学》88卷,第1121 - 1134页)在聚类大小与结果相关时,针对广义线性模型引入了这种方法。在本文中,我们展示了该方法的广泛适用性。讨论了其在角度数据、p值、向量参数、贝叶斯推断、遗传学数据以及随机聚类大小方面的应用。此外,在弱条件下,证明了基于所有可能输出法以及有限次输出法的估计值的渐近正态性。多次输出法为分析聚类数据提供了一种简单且广泛适用的方法。它特别适用于聚类数据方法不实用的情况,但也可普遍用作一种快速简单的工具。