Suppr超能文献

在病例对照全基因组关联研究中检测疾病相关单核苷酸多态性的概率。

Probability of detecting disease-associated single nucleotide polymorphisms in case-control genome-wide association studies.

作者信息

Gail Mitchell H, Pfeiffer Ruth M, Wheeler William, Pee David

机构信息

Division of Cancer Epidemiology and Genetics, National Cancer Institute, 6120 Executive Boulevard, EPS 8032, Bethesda, MD 20892-7244, USA.

出版信息

Biostatistics. 2008 Apr;9(2):201-15. doi: 10.1093/biostatistics/kxm032. Epub 2007 Sep 14.

Abstract

Some case-control genome-wide association studies (CCGWASs) select promising single nucleotide polymorphisms (SNPs) by ranking corresponding p-values, rather than by applying the same p-value threshold to each SNP. For such a study, we define the detection probability (DP) for a specific disease-associated SNP as the probability that the SNP will be "T-selected," namely have one of the top T largest chi-square values (or smallest p-values) for trend tests of association. The corresponding proportion positive (PP) is the fraction of selected SNPs that are true disease-associated SNPs. We study DP and PP analytically and via simulations, both for fixed and for random effects models of genetic risk, that allow for heterogeneity in genetic risk. DP increases with genetic effect size and case-control sample size and decreases with the number of nondisease-associated SNPs, mainly through the ratio of T to N, the total number of SNPs. We show that DP increases very slowly with T, and the increment in DP per unit increase in T declines rapidly with T. DP is also diminished if the number of true disease SNPs exceeds T. For a genetic odds ratio per minor disease allele of 1.2 or less, even a CCGWAS with 1000 cases and 1000 controls requires T to be impractically large to achieve an acceptable DP, leading to PP values so low as to make the study futile and misleading. We further calculate the sample size of the initial CCGWAS that is required to minimize the total cost of a research program that also includes follow-up studies to examine the T-selected SNPs. A large initial CCGWAS is desirable if genetic effects are small or if the cost of a follow-up study is large.

摘要

一些病例对照全基因组关联研究(CCGWAS)通过对相应的p值进行排序来选择有前景的单核苷酸多态性(SNP),而不是对每个SNP应用相同的p值阈值。对于这样一项研究,我们将特定疾病相关SNP的检测概率(DP)定义为该SNP被“T选择”的概率,即对于关联趋势检验具有T个最大卡方值(或最小p值)之一。相应的阳性比例(PP)是所选SNP中真正与疾病相关的SNP的比例。我们通过分析和模拟研究了DP和PP,包括固定效应模型和随机效应模型,这些模型考虑了遗传风险的异质性。DP随着遗传效应大小和病例对照样本量的增加而增加,随着非疾病相关SNP数量的增加而减少,主要是通过T与SNP总数N的比例。我们表明,DP随着T的增加非常缓慢,并且T每增加一个单位,DP的增量会随着T的增加而迅速下降。如果真正的疾病SNP数量超过T,DP也会降低。对于每个次要疾病等位基因的遗传优势比为1.2或更低的情况,即使是一项有1000例病例和1000例对照的CCGWAS,也需要T大到不切实际的程度才能达到可接受的DP,导致PP值低到使该研究徒劳且具有误导性。我们进一步计算了初始CCGWAS的样本量,该样本量是使一个研究项目的总成本最小化所必需的,该研究项目还包括对T选择的SNP进行后续研究。如果遗传效应较小或后续研究成本较高,那么较大的初始CCGWAS是可取的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验