Glynn Adam, Wakefield Jon, Handcock Mark S, Richardson Thomas S
Department of Statistics, University of Washington, Seattle, USA.
J R Stat Soc Ser A Stat Soc. 2008 Jan 1;171(1):179-202. doi: 10.1111/j.1467-985X.2007.00511.x.
In this paper, we illustrate that combining ecological data with subsample data in situations in which a linear model is appropriate provides three main benefits. First, by including the individual level subsample data, the biases associated with linear ecological inference can be eliminated. Second, by supplementing the subsample data with ecological data, the information about parameters will be increased. Third, we can use readily available ecological data to design optimal subsampling schemes, so as to further increase the information about parameters. We present an application of this methodology to the classic problem of estimating the effect of a college degree on wages. We show that combining ecological data with subsample data provides precise estimates of this value, and that optimal subsampling schemes (conditional on the ecological data) can provide good precision with only a fraction of the observations.
在本文中,我们阐述了在适合线性模型的情形下,将生态数据与子样本数据相结合能带来三个主要益处。其一,通过纳入个体层面的子样本数据,可消除与线性生态推断相关的偏差。其二,用生态数据补充子样本数据,会增加有关参数的信息。其三,我们能够利用现成的生态数据来设计最优子抽样方案,从而进一步增加有关参数的信息。我们将这种方法应用于估计大学学位对工资影响的经典问题。我们表明,将生态数据与子样本数据相结合能精确估计该值,并且最优子抽样方案(以生态数据为条件)仅需一小部分观测值就能提供良好的精度。