全基因组关联研究策略：通过逐步聚焦法优化研究设计

Strategies for genome-wide association studies: optimization of study designs by the stepwise focusing method.

作者信息

Saito Akira, Kamatani Naoyuki

机构信息

Division of Statistical Genetics, Institute of Rheumatology, Tokyo Women's Medical University, Tokyo, Japan.

出版信息

J Hum Genet. 2002;47(7):360-5. doi: 10.1007/s100380200050.

DOI:10.1007/s100380200050

PMID:12111370

Abstract

Recently, the use of genome-wide linkage disequilibrium (LD) analysis to localize traits has attracted much attention because of the introduction of high-throughput genotyping systems. However, a limitation of such studies is often the total cost of genotyping in addition to sample size. Therefore, it is important to estimate optimal conditions for such a study given the total cost of genotyping. In the present study, we have introduced the "stepwise focusing method," in which candidate markers are selected in a stepwise fashion. In the first focusing step, samples from both case and control groups are genotyped at a certain number of single-nucleotide polymorphisms (SNPs) (for example, 50000), and the markers that exhibit significant intergroup differences by a chi(2) test are selected. In the first step, the risk of type I error is set rather high (for example, 0.1), and, therefore, most of the selected markers are false positives. In the second step, the markers selected in the first step are tested by using samples obtained from a different set of case-control samples. We performed extensive simulation studies to estimate both the type I error and the power of the test by changing parameters such as genotype relative risk, disease allele frequency, and sample size. If the total number of genotypings was limited, the stepwise focusing method yielded optimal conditions and was more powerful than conventional methods.

摘要

最近，由于高通量基因分型系统的引入，利用全基因组连锁不平衡（LD）分析来定位性状受到了广泛关注。然而，此类研究的一个局限性通常是除样本量之外的基因分型总成本。因此，在考虑基因分型总成本的情况下，估计此类研究的最佳条件非常重要。在本研究中，我们引入了“逐步聚焦法”，即逐步选择候选标记。在第一个聚焦步骤中，对病例组和对照组的样本进行一定数量的单核苷酸多态性（SNP）基因分型（例如50000个），并通过卡方检验选择显示出显著组间差异的标记。在第一步中，将I型错误的风险设定得相当高（例如0.1），因此，大多数所选标记都是假阳性。在第二步中，使用从另一组病例对照样本中获得的样本对第一步中选择的标记进行测试。我们通过改变基因型相对风险、疾病等位基因频率和样本量等参数，进行了广泛的模拟研究，以估计I型错误和检验效能。如果基因分型的总数有限，逐步聚焦法可产生最佳条件，并且比传统方法更有效。