Avramopoulos Dimitrios, Zandi Peter, Gherman Adrian, Fallin M Daniele, Bassett Susan S
Department of Psychiatry, The Johns Hopkins University, School of Medicine, Broadway Research Building 509, 733 North Broadway, Baltimore, MD 21205, USA.
Hum Genomics. 2006 Jun;2(6):345-52. doi: 10.1186/1479-7364-2-6-345.
Genes for complex disorders have proven hard to find using linkage analysis. The results rarely reach the desired level of significance and researchers often have failed to replicate positive findings. There is, however, a wealth of information from other scientific approaches which enables the formation of hypotheses on groups of genes or genomic regions likely to be enriched in disease loci. Examples include genes belonging to specific pathways or producing proteins interacting with known risk factors, genes that show altered expression levels in patients or even the group of top scoring locations in a linkage study. We show here that this hypothesis of enrichment for disease loci can be tested using genome-wide linkage data, provided that these data are independent from the data used to generate the hypothesis. Our method is based on the fact that non-parametric linkage analyses are expected to show increased scores at each one of the disease loci, although this increase might not rise above the noise of stochastic variation. By using a summary statistic and calculating its empirical significance, we show that enrichment hypotheses can be tested with power higher than the power of the linkage scan data to identify individual loci. Via simulated linkage scans for a number of different models, we gain insight in the interpretation of genome scan results and test the power of our proposed method. We present an application of the method to real data from a late-onset Alzheimer's disease linkage scan as a proof of principle.
事实证明,使用连锁分析很难找到复杂疾病的基因。研究结果很少能达到预期的显著水平,研究人员也常常无法重复阳性结果。然而,其他科学方法提供了丰富的信息,这些信息有助于形成关于可能在疾病位点富集的基因群或基因组区域的假设。例如,属于特定信号通路或产生与已知风险因素相互作用的蛋白质的基因、在患者中表达水平发生改变的基因,甚至是连锁研究中得分最高的位点群。我们在此表明,只要全基因组连锁数据独立于用于生成假设的数据,就可以使用这些数据来检验疾病位点富集的这一假设。我们的方法基于这样一个事实,即非参数连锁分析预计会在每个疾病位点显示出得分增加,尽管这种增加可能不会超过随机变异的噪声水平。通过使用一个汇总统计量并计算其经验显著性,我们表明可以以高于连锁扫描数据识别单个位点的检验效能来检验富集假设。通过对多种不同模型进行模拟连锁扫描,我们深入了解了基因组扫描结果的解读,并检验了我们所提出方法的效能。作为原理验证,我们展示了该方法在迟发性阿尔茨海默病连锁扫描真实数据中的应用。