Curtis David, Vine Anna E, Knight Jo
Centre for Psychiatry, Queen Mary's School of Medicine and Dentistry, London, UK.
BMC Genet. 2007 May 10;8:20. doi: 10.1186/1471-2156-8-20.
Researchers may embark on a genome-wide association study before fully investigating candidate regions which have been reported to produce evidence to suggest that they harbour susceptibility loci. If the genome wide study had not been carried out then results which demonstrated only modest statistical significance from candidate regions would be judged to be of interest and would stimulate further investigation. However if hundreds of thousands of markers are typed then inevitably very large numbers of such results will occur by chance and those from candidate regions may attract no special attention.
An approach is proposed in which differential treatment is afforded to markers from candidate regions and from those that are routinely typed in the context of a genome wide scan. Different prior probabilities are assigned to the two types of marker. A likelihood ratio is derived from the reported p value for each marker, calculated as LR = echiinv(1,p)/2, and the posterior odds in favour of a true positive association are obtained. These odds can be used to rank the markers with a view to suggesting the regions in which further genotyping is indicated. We suggest that prior probabilities be specified such that a candidate marker significant at p = 0.01 and a routine marker significant at p = 0.00001 will yield similar values for the posterior odds. We show that this can be achieved by setting a value for prior probability of association to 0.1 for candidate markers and to 0.00018 for routine markers.
It is essential that formal procedures be adopted in order to avoid modestly positively results from candidate regions being swamped by the huge number of nominally significant results which will be obtained when very many markers are genotyped. Software to carry out the conversion from p values to posterior odds is available from http://www.mds.qmul.ac.uk/statgen/grpsoft.html.
研究人员可能在充分研究已报道产生证据表明含有易感基因座的候选区域之前就开展全基因组关联研究。如果未进行全基因组研究,那么仅从候选区域得出的具有适度统计学显著性的结果会被认为具有研究价值,并会刺激进一步研究。然而,如果对数十万标记进行分型,那么不可避免地会有大量此类结果偶然出现,而来自候选区域的结果可能不会引起特别关注。
提出一种方法,对候选区域的标记和全基因组扫描中常规分型的标记进行区别对待。为这两种类型的标记赋予不同的先验概率。从每个标记报告的p值推导出似然比,计算为LR = echiinv(1,p)/2,并获得支持真正阳性关联的后验概率。这些概率可用于对标记进行排序,以指明需要进一步进行基因分型的区域。我们建议指定先验概率,使得在p = 0.01时显著的候选标记和在p = 0.00001时显著的常规标记产生相似的后验概率值。我们表明,通过将候选标记的关联先验概率值设为0.1,常规标记的关联先验概率值设为0.00018,可以实现这一点。
必须采用正式程序,以避免候选区域中适度阳性结果被大量标记进行基因分型时获得的大量名义上显著的结果所淹没。可从http://www.mds.qmul.ac.uk/statgen/grpsoft.html获取将p值转换为后验概率的软件。