Wong M Y, Day N E, Luan J A, Chan K P, Wareham N J
Department of Mathematics, The Hong Kong University of Science & Technology, Hong Kong.
Int J Epidemiol. 2003 Feb;32(1):51-7. doi: 10.1093/ije/dyg002.
The search for biologically relevant gene-environment interactions has been facilitated by technological advances in genotyping. The design of studies to detect interactions on continuous traits such as blood pressure and insulin sensitivity is attracting increasing attention. We have previously described power calculations for such studies, and this paper describes the extension of those calculations to take account of measurement error.
The model considered in this paper is a simple linear regression relating a continuous outcome to a continuously distributed exposure variable in which the ratio of slopes for each genotype is considered as the interaction parameter. The classical measurement error model is used to describe the uncertainty in measurement in the outcome and the exposure. The sample size to detect differing magnitudes of interaction with varying frequencies of the minor allele are calculated for a given main effect observed with error both in the exposure and the outcome. The sample size to detect a given interaction for a given minor allele frequency is calculated for differing degrees of measurement error in the assessment of the exposure and the outcome.
The required sample size is dependent upon the magnitude of the interaction, the allele frequency and the strength of the association in those with the common allele. As an example, we take the situation in which the effect size in those with the common allele was a quarter of a standard deviation change in the outcome for a standard deviation change in the exposure. If a minor allele with a frequency of 20% leads to a doubling of that effect size, then the sample size is highly dependent upon the precision with which the exposure and outcome are measured. rho(Tx) and rho(Ty) are the correlation between the measured exposure and outcome, respectively and the true value. If poor measures of the exposure and outcome are used, (e.g. rho(Tx) = 0.3, rho(Ty) = 0.4), then a study size of 150 989 people would be required to detect the interaction with 95% power at a significance level of 10(-4). Such an interaction could be detected in study samples of under 10 000 people if more precise measurements of exposure and outcome were made (e.g. rho(Tx) = 0.7, rho(Ty) = 0.7), and possibly in samples of under 5000 if the precision of estimation were enhanced by taking repeated measurements.
The formulae for calculating the sample size required to study the interaction between a continuous exposure and a genetic factor on a continuous outcome variable in the face of measurement error will be of considerable utility in designing studies with appropriate power. These calculations suggest that smaller studies with repeated and more precise measurement of the exposure and outcome will be as powerful as studies even 20 times bigger, which necessarily employ less precise measures because of their size. Even though the cost of genotyping is falling, the magnitude of the effect of measurement error on the power to detect interaction on continuous traits suggests that investment in studies with better measurement may be a more appropriate strategy than attempting to deal with error by increasing sample sizes.
基因分型技术的进步推动了对生物学相关基因 - 环境相互作用的探索。针对诸如血压和胰岛素敏感性等连续性性状检测相互作用的研究设计正吸引着越来越多的关注。我们之前已描述过此类研究的功效计算,本文描述了将这些计算进行扩展以考虑测量误差的情况。
本文所考虑的模型是一个简单线性回归,将一个连续性结局与一个连续分布的暴露变量相关联,其中每个基因型的斜率比被视为相互作用参数。经典测量误差模型用于描述结局和暴露测量中的不确定性。针对在暴露和结局中均存在误差情况下观察到的给定主效应,计算检测不同大小相互作用以及次要等位基因不同频率时所需的样本量。针对暴露和结局评估中不同程度的测量误差,计算检测给定次要等位基因频率下给定相互作用所需的样本量。
所需样本量取决于相互作用的大小、等位基因频率以及具有常见等位基因者的关联强度。例如,我们考虑这样一种情况,即具有常见等位基因者的效应大小为暴露变量一个标准差变化时结局变量变化四分之一标准差。如果频率为20%的次要等位基因使该效应大小翻倍,那么样本量高度依赖于暴露和结局的测量精度。rho(Tx)和rho(Ty)分别是测量的暴露和结局与真实值之间的相关性。如果使用对暴露和结局的较差测量方法(例如,rho(Tx)=0.3,rho(Ty)=0.4),那么在显著性水平为10^(-4)时,需要150989人的研究样本量才能以95%的功效检测到相互作用。如果对暴露和结局进行更精确的测量(例如,rho(Tx)=0.7,rho(Ty)=0.7),在不到10000人的研究样本中就可以检测到这样的相互作用,如果通过重复测量提高估计精度,可能在不到5000人的样本中就能检测到。
在存在测量误差的情况下,计算研究连续性暴露与遗传因素对连续性结局变量相互作用所需样本量的公式,在设计具有适当功效的研究中将具有相当大的实用性。这些计算表明,对暴露和结局进行重复且更精确测量的较小研究,其功效将与甚至大20倍的研究相同,而大研究由于规模原因必然采用不太精确的测量方法。尽管基因分型成本在下降,但测量误差对检测连续性性状相互作用功效的影响程度表明,投资于测量更好的研究可能是比试图通过增加样本量来处理误差更合适的策略。