Carroll R J, Freedman L, Pee D
Department of Statistics, Texas A&M University, College Station, USA.
Biometrics. 1997 Dec;53(4):1440-57.
Motivated by an example in nutritional epidemiology, we investigate some design and analysis aspects of linear measurement error models with missing surrogate data. The specific problem investigated consists of an initial large sample in which the response (a food frequency questionnaire, FFQ) is observed and then a smaller calibration study in which replicates of the error prone predictor are observed (food records or recalls, FR). The difference between our analysis and most of the measurement error model literature is that, in our study, the selection into the calibration study can depend on the value of the response. Rationale for this type of design is given. Two major problems are investigated. In the design of a calibration study, one has the option of larger sample sizes and fewer replicates or smaller sample sizes and more replicates. Somewhat surprisingly, neither strategy is uniformly preferable in cases of practical interest. The answers depend on the instrument used (recalls or records) and the parameters of interest. The second problem investigated is one of analysis. In the usual linear model with no missing data, method of moments estimates and normal-theory maximum likelihood estimates are approximately equivalent, with the former method in most use because it can be calculated easily and explicitly. Both estimates are valid without any distributional assumptions. In contrast, in the missing data problem under consideration, only the moments estimate is distribution-free, but the maximum likelihood estimate has at least 50% greater precision in practical situations when normality obtains. Implications for the design of nutritional calibration studies are discussed.
受营养流行病学中一个例子的启发,我们研究了具有缺失替代数据的线性测量误差模型的一些设计和分析方面。所研究的具体问题包括一个初始大样本,其中观察到响应(食物频率问卷,FFQ),然后是一个较小的校准研究,其中观察到易出错预测变量的重复数据(食物记录或回忆,FR)。我们的分析与大多数测量误差模型文献的不同之处在于,在我们的研究中,进入校准研究的选择可以取决于响应的值。给出了这种设计类型的基本原理。研究了两个主要问题。在校准研究的设计中,可以选择较大的样本量和较少的重复次数,或者较小的样本量和较多的重复次数。有点令人惊讶的是,在实际感兴趣的情况下,这两种策略都不是普遍更可取的。答案取决于所使用的工具(回忆或记录)以及感兴趣的参数。研究的第二个问题是分析问题。在没有缺失数据的通常线性模型中,矩估计法和正态理论最大似然估计法大致等效,前者使用最多,因为它可以轻松明确地计算出来。两种估计在没有任何分布假设的情况下都是有效的。相比之下,在考虑的缺失数据问题中,只有矩估计是无分布的,但在正态性成立的实际情况下,最大似然估计的精度至少高50%。讨论了对营养校准研究设计的影响。