Department of Epidemiology, The University of Texas M. D. Anderson Cancer Center,Houston, TX 77030, USA.
Am J Epidemiol. 2010 Mar 15;171(6):682-90. doi: 10.1093/aje/kwp449. Epub 2010 Feb 5.
In the current era of diet-gene analyses, large sample sizes are required to uncover the etiology of complex diseases. As such, consortia form and often combine available data. Food frequency questionnaires, which commonly use 2 different types of responses about the frequency of intake (predefined responses and open-ended responses), may be pooled to achieve the desired sample size. The common practice is to categorize open-ended responses into the predefined response categories. A problem arises when the predefined categories are noncontiguous: possible open-ended responses may fall in gaps between the predefined categories. Using simulated data modeled from frequency of intake among 1,664 controls in a lung cancer case-control study at The University of Texas M. D. Anderson Cancer Center (Houston, Texas, 2000-2005), the authors describe the effect of different categories of open-ended responses that fall in between noncontiguous, predefined response sets on estimates of the mean difference in intake and the odds ratios. A significant inflation of false positives appears when comparing mean differences of intake, while the bias in estimating odds ratios may be acceptably small. Therefore, if pooling data cannot be restricted to the same type of response, inferences should focus on odds ratio estimation to minimize bias.
在当前的饮食-基因分析时代,需要大样本量才能揭示复杂疾病的病因。因此,联盟形成并经常合并可用数据。食物频率问卷通常使用 2 种不同类型的摄入量频率回答(预定义回答和开放式回答),可以将它们合并以达到所需的样本量。常见的做法是将开放式回答归入预定义的回答类别。当预定义类别不连续时,就会出现问题:可能的开放式回答可能落在预定义类别之间的空白处。本文使用 2000-2005 年在德克萨斯大学 M.D.安德森癌症中心(休斯顿,德克萨斯州)进行的肺癌病例对照研究中 1664 名对照者的摄入量频率模型模拟数据,描述了落入不连续预定义响应集之间的开放式响应的不同类别对摄入均值差异和比值比估计的影响。当比较摄入量的均值差异时,假阳性的显著膨胀出现,而估计比值比的偏差可能可以接受小。因此,如果不能将数据合并限制为相同类型的响应,则推断应侧重于比值比估计,以最小化偏差。