Li Xiang, Kuk Anthony Y C, Xu Jinfeng
Department of Statistics and Applied Probability, National University of Singapore, Singapore, 117546.
Stat Med. 2014 Dec 10;33(28):4999-5014. doi: 10.1002/sim.6304. Epub 2014 Sep 12.
Human biomonitoring of exposure to environmental chemicals is important. Individual monitoring is not viable because of low individual exposure level or insufficient volume of materials and the prohibitive cost of taking measurements from many subjects. Pooling of samples is an efficient and cost-effective way to collect data. Estimation is, however, complicated as individual values within each pool are not observed but are only known up to their average or weighted average. The distribution of such averages is intractable when the individual measurements are lognormally distributed, which is a common assumption. We propose to replace the intractable distribution of the pool averages by a Gaussian likelihood to obtain parameter estimates. If the pool size is large, this method produces statistically efficient estimates, but regardless of pool size, the method yields consistent estimates as the number of pools increases. An empirical Bayes (EB) Gaussian likelihood approach, as well as its Bayesian analog, is developed to pool information from various demographic groups by using a mixed-effect formulation. We also discuss methods to estimate the underlying mean-variance relationship and to select a good model for the means, which can be incorporated into the proposed EB or Bayes framework. By borrowing strength across groups, the EB estimator is more efficient than the individual group-specific estimator. Simulation results show that the EB Gaussian likelihood estimates outperform a previous method proposed for the National Health and Nutrition Examination Surveys with much smaller bias and better coverage in interval estimation, especially after correction of bias.
对环境化学物质暴露进行人体生物监测很重要。由于个体暴露水平低、材料量不足以及对众多受试者进行测量的成本过高,个体监测不可行。样本合并是一种高效且具成本效益的数据收集方式。然而,由于每个合并样本中的个体值无法直接观测到,仅知道其平均值或加权平均值,估计过程变得复杂。当个体测量值呈对数正态分布(这是一个常见假设)时,这些平均值的分布难以处理。我们建议用高斯似然函数替代合并样本平均值的难处理分布,以获得参数估计值。如果合并样本量很大,该方法会产生统计上有效的估计值,但无论合并样本量大小,随着合并样本数量的增加,该方法都会产生一致的估计值。我们开发了一种经验贝叶斯(EB)高斯似然方法及其贝叶斯类似方法,通过混合效应公式从不同人口群体中汇集信息。我们还讨论了估计潜在均值 - 方差关系以及为均值选择良好模型的方法,这些方法可纳入所提出的EB或贝叶斯框架。通过跨组借用强度,EB估计器比个体组特定估计器更有效。模拟结果表明,EB高斯似然估计在偏差小得多且区间估计覆盖度更好方面优于先前为国家健康与营养检查调查提出的方法,尤其是在偏差校正之后。