Zaykin Dmitri V, Zhivotovsky Lev A
National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA.
Genetics. 2005 Oct;171(2):813-23. doi: 10.1534/genetics.105.044206. Epub 2005 Jul 14.
With the recent advances in high-throughput genotyping techniques, it is now possible to perform whole-genome association studies to fine map causal polymorphisms underlying important traits that influence susceptibility to human diseases and efficacy of drugs. Once a genome scan is completed the results can be sorted by the association statistic value. What is the probability that true positives will be encountered among the first most associated markers? When a particular polymorphism is found associated with the trait, there is a chance that it represents either a "true" or a "false" association (TA vs. FA). Setting appropriate significance thresholds has been considered to provide assurance of sufficient odds that the associations found to be significant are genuine. However, the problem with genome scans involving thousands of markers is that the statistic values of FAs can reach quite extreme magnitudes. In such situations, the distributions corresponding to TAs and the most extreme FAs become comparable and significance thresholds tend to penalize TAs and FAs in a similar fashion. When sorting between true and false associations, the "typical" place (i.e., rank) of TAs among the most significant outcomes becomes important, ordered by the association statistic value. The distribution of ranks that we study here allows calculation of several useful quantities. In particular, it gives the number of most significant markers needed for a follow-up study to guarantee that a true association is included with certain probability. This can be calculated conditionally on having applied a multiple-testing correction. Effects of multilocus (e.g., haplotype association) tests and impact of linkage disequilibrium on the distribution of ranks associated with TAs are evaluated and can be taken into account.
随着高通量基因分型技术的最新进展,现在有可能进行全基因组关联研究,以精细定位影响人类疾病易感性和药物疗效的重要性状背后的因果多态性。一旦完成基因组扫描,结果就可以按关联统计值进行排序。在最先出现的最相关标记中遇到真阳性的概率是多少?当发现特定的多态性与该性状相关时,它有可能代表“真”关联或“假”关联(真关联与假关联)。设定适当的显著性阈值被认为可以确保有足够的几率表明被发现具有显著性的关联是真实的。然而,涉及数千个标记的基因组扫描的问题在于,假关联的统计值可能会达到相当极端的程度。在这种情况下,真关联和最极端假关联对应的分布变得可比,显著性阈值往往会以类似的方式惩罚真关联和假关联。在区分真关联和假关联时,按关联统计值排序,真关联在最显著结果中的“典型”位置(即排名)就变得很重要。我们在此研究的排名分布允许计算几个有用的量。特别是,它给出了后续研究为保证以一定概率包含真关联所需的最显著标记的数量。这可以在应用多重检验校正的条件下进行计算。评估多位点(例如单倍型关联)检验的效果以及连锁不平衡对与真关联相关的排名分布的影响,并可以将其考虑在内。