用于基因关联研究的两阶段抽样设计。

Two-Stage sampling designs for gene association studies.

作者信息

Thomas Duncan, Xie Rongrong, Gebregziabher Mulugeta

机构信息

Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089-9011, USA.

出版信息

Genet Epidemiol. 2004 Dec;27(4):401-14. doi: 10.1002/gepi.20047.

DOI:10.1002/gepi.20047

PMID:15543639

Abstract

We consider two-stage case-control designs for testing associations between single nucleotide polymorphisms (SNPs) and disease, in which a subsample of subjects is used to select a panel of "tagging" SNPs that will be considered in the main study. We propose a pseudolikelihood [Pepe and Flemming, 1991: JASA 86:108-113] that combines the information from both the main study and the substudy to test the association with any polymorphism in the original set. SNP-tagging [Chapman et al., 2003: Hum Hered 56:18-31] and haplotype-tagging [Stram et al., 2003a; Hum Hered 55:27-36] approaches are compared. We show that the cost-efficiency of such a design for estimating the relative risk associated with the causal polymorphism can be considerably better than for a single-stage design, even if the causal polymorphism is not included in the tag-SNP set. We also consider the optimal selection of cases and controls in such designs and the relative efficiency for estimating the location of a causal variant in linkage disequilibrium mapping. Nevertheless, as the cost of high-volume genotyping plummets and haplotype tagging information from the International HapMap project [Gibbs et al., 2003; Nature 426:789-796] rapidly accumulates in public databases, such two-stage designs may soon become unnecessary.

摘要

我们考虑采用两阶段病例对照设计来检验单核苷酸多态性（SNP）与疾病之间的关联，其中会使用受试者的一个子样本选择一组“标签”SNP，这些SNP将在主要研究中进行考量。我们提出一种伪似然法[Pepe和Flemming，1991：《美国统计协会杂志》86：108 - 113]，它结合了主要研究和子研究的信息，以检验与原始集合中任何多态性的关联。对SNP标签法[Chapman等人，2003：《人类遗传学》56：18 - 31]和单倍型标签法[Stram等人，2003a；《人类遗传学》55：27 - 36]进行了比较。我们表明，即使因果多态性不包含在标签SNP集合中，这种设计用于估计与因果多态性相关的相对风险时，成本效率可能比单阶段设计要好得多。我们还考虑了此类设计中病例和对照的最优选择，以及在连锁不平衡图谱中估计因果变异位置的相对效率。然而，随着高通量基因分型成本的大幅下降以及来自国际人类基因组单体型图计划[Gibbs等人，2003；《自然》426：789 - 796]的单倍型标签信息在公共数据库中迅速积累，这种两阶段设计可能很快就不再必要了。