Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599-7420, USA.
Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599-7420, USA.
Am J Hum Genet. 2014 Dec 4;95(6):675-88. doi: 10.1016/j.ajhg.2014.11.005.
The cohort design allows investigators to explore the genetic basis of a variety of diseases and traits in a single study while avoiding major weaknesses of the case-control design. Most cohort studies employ multistage cluster sampling with unequal probabilities to conveniently select participants with desired characteristics, and participants from different clusters might be genetically related. Analysis that ignores the complex sampling design can yield biased estimation of the genetic association and inflation of the type I error. Herein, we develop weighted estimators that reflect unequal selection probabilities and differential nonresponse rates, and we derive variance estimators that properly account for the sampling design and the potential relatedness of participants in different sampling units. We compare, both analytically and numerically, the performance of the proposed weighted estimators with unweighted estimators that disregard the sampling design. We demonstrate the usefulness of the proposed methods through analysis of MetaboChip data in the Hispanic Community Health Study/Study of Latinos, which is the largest health study of the Hispanic/Latino population in the United States aimed at identifying risk factors for various diseases and determining the role of genes and environment in the occurrence of diseases. We provide guidelines on the use of weighted and unweighted estimators, as well as the relevant software.
队列设计允许研究人员在一项研究中探索各种疾病和特征的遗传基础,同时避免病例对照设计的主要弱点。大多数队列研究采用多阶段聚类抽样,采用不等概率的方法方便地选择具有所需特征的参与者,并且来自不同聚类的参与者可能具有遗传相关性。忽略复杂抽样设计的分析可能会导致遗传关联的偏倚估计和 I 型错误的膨胀。本文中,我们开发了加权估计量,反映了不等的选择概率和不同的无反应率,并推导出适当考虑抽样设计和不同抽样单位中参与者潜在相关性的方差估计量。我们通过分析西班牙裔社区健康研究/拉丁裔研究中的 MetaboChip 数据,对所提出的加权估计量和不考虑抽样设计的未加权估计量进行了分析和数值比较。我们展示了所提出的方法的有用性,该方法通过分析美国最大的西班牙裔/拉丁裔人口健康研究——西班牙裔社区健康研究/拉丁裔研究中的 MetaboChip 数据,旨在确定各种疾病的风险因素,并确定基因和环境在疾病发生中的作用。我们提供了关于使用加权和未加权估计量以及相关软件的指南。