Lin D Y, Zeng D
Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599-7420, USA.
Genet Epidemiol. 2009 Apr;33(3):256-65. doi: 10.1002/gepi.20377.
Case-control association studies often collect extensive information on secondary phenotypes, which are quantitative or qualitative traits other than the case-control status. Exploring secondary phenotypes can yield valuable insights into biological pathways and identify genetic variants influencing phenotypes of direct interest. All publications on secondary phenotypes have used standard statistical methods, such as least-squares regression for quantitative traits. Because of unequal selection probabilities between cases and controls, the case-control sample is not a random sample from the general population. As a result, standard statistical analysis of secondary phenotype data can be extremely misleading. Although one may avoid the sampling bias by analyzing cases and controls separately or by including the case-control status as a covariate in the model, the associations between a secondary phenotype and a genetic variant in the case and control groups can be quite different from the association in the general population. In this article, we present novel statistical methods that properly reflect the case-control sampling in the analysis of secondary phenotype data. The new methods provide unbiased estimation of genetic effects and accurate control of false-positive rates while maximizing statistical power. We demonstrate the pitfalls of the standard methods and the advantages of the new methods both analytically and numerically. The relevant software is available at our website.
病例对照关联研究通常会收集关于次要表型的大量信息,次要表型是指除病例对照状态之外的定量或定性特征。探索次要表型能够为生物途径提供有价值的见解,并识别影响直接感兴趣表型的基因变异。所有关于次要表型的出版物都使用了标准统计方法,如针对定量特征的最小二乘回归。由于病例组和对照组之间的选择概率不相等,病例对照样本并非来自一般人群的随机样本。因此,对次要表型数据进行标准统计分析可能会产生极大的误导。尽管可以通过分别分析病例组和对照组,或者在模型中纳入病例对照状态作为协变量来避免抽样偏差,但次要表型与病例组和对照组中基因变异之间的关联可能与一般人群中的关联有很大不同。在本文中,我们提出了新的统计方法,这些方法在分析次要表型数据时能恰当地反映病例对照抽样情况。新方法在最大化统计功效的同时,能提供无偏的遗传效应估计以及对假阳性率的精确控制。我们通过分析和数值模拟展示了标准方法的缺陷以及新方法的优势。相关软件可在我们的网站获取。