Guedj Mickael, Robelin David, Hoebeke Mark, Lamarine Marc, Wojcik Jérôme, Nuel Gregory
Laboratoire Statistique et Génome.
Stat Appl Genet Mol Biol. 2006;5:Article22. doi: 10.2202/1544-6115.1192. Epub 2006 Sep 17.
Genetic epidemiology aims at identifying biological mechanisms responsible for human diseases. Genome-wide association studies, made possible by recent improvements in genotyping technologies, are now promisingly investigated. In these studies, common first-stage strategies focus on marginal effects but lead to multiple-testing and are unable to capture the possibly complex interplay between genetic factors. We have adapted the use of the local score statistic, already successfully applied to analyse long molecular sequences. Via sum statistics, this method captures local and possible distant dependences between markers. Dedicated to genome-wide association studies, it is fast to compute, able to handle large datasets, circumvents the the multiple-testing problem and outlines a set of genomic regions (segments) for further analyses. Applied to simulated and real data, our approach outperforms classical Bonferroni and FDR corrections for multiple-testing. It is implemented in a software termed LHiSA for Local High-scoring Segments for Association and available at: http://stat.genopole.cnrs.fr/software/lhisa.
遗传流行病学旨在确定导致人类疾病的生物学机制。随着基因分型技术的最新进展,全基因组关联研究成为可能,目前正得到有前景的研究。在这些研究中,常见的第一阶段策略侧重于边际效应,但会导致多重检验,并且无法捕捉遗传因素之间可能复杂的相互作用。我们采用了已经成功应用于分析长分子序列的局部得分统计量。通过和统计量,该方法捕捉标记之间的局部和可能的远距离依赖性。专门用于全基因组关联研究,它计算速度快,能够处理大型数据集,规避多重检验问题,并勾勒出一组基因组区域(片段)以供进一步分析。应用于模拟数据和真实数据时,我们的方法在多重检验方面优于经典的Bonferroni和FDR校正。它在一个名为LHiSA(用于关联的局部高分片段)的软件中实现,可在以下网址获取:http://stat.genopole.cnrs.fr/software/lhisa 。