Bush William S, Dudek Scott M, Ritchie Marylyn D
Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37232, USA.
Pac Symp Biocomput. 2009:368-79.
Genome-wide association studies provide an unprecedented opportunity to identify combinations of genetic variants that contribute to disease susceptibility. The combinatorial problem of jointly analyzing the millions of genetic variations accessible by high-throughput genotyping technologies is a difficult challenge. One approach to reducing the search space of this variable selection problem is to assess specific combinations of genetic variations based on prior statistical and biological knowledge. In this work, we provide a systematic approach to integrate multiple public databases of gene groupings and sets of disease-related genes to produce multi-SNP models that have an established biological foundation. This approach yields a collection of models which can be tested statistically in genome-wide data, along with an ordinal quantity describing the number of data sources that support any given model. Using this knowledge-driven approach reduces the computational and statistical burden of large-scale interaction analysis while simultaneously providing a biological foundation for the relevance of any significant statistical result that is found.
全基因组关联研究为识别导致疾病易感性的基因变异组合提供了前所未有的机会。通过高通量基因分型技术可获取数百万个基因变异,联合分析这些变异的组合问题是一项艰巨的挑战。减少此变量选择问题搜索空间的一种方法是基于先前的统计和生物学知识评估基因变异的特定组合。在这项工作中,我们提供了一种系统方法,整合多个关于基因分组和疾病相关基因集的公共数据库,以生成具有既定生物学基础的多单核苷酸多态性(SNP)模型。这种方法产生了一组模型,这些模型可在全基因组数据中进行统计测试,同时还产生一个序数数量,描述支持任何给定模型的数据源数量。使用这种知识驱动的方法可减轻大规模相互作用分析的计算和统计负担,同时为所发现的任何显著统计结果的相关性提供生物学基础。