Grady Benjamin J, Torstenson Eric, Dudek Scott M, Giles Justin, Sexton David, Ritchie Marylyn D
Center for Human Genetics Research, Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, TN 37232, United States.
Pac Symp Biocomput. 2010:315-26.
The methods to detect gene-gene interactions between variants in genome-wide association study (GWAS) datasets have not been well developed thus far. PLATO, the Platform for the Analysis, Translation and Organization of large-scale data, is a filter-based method bringing together many analytical methods simultaneously in an effort to solve this problem. PLATO filters a large, genomic dataset down to a subset of genetic variants, which may be useful for interaction analysis. As a precursor to the use of PLATO for the detection of gene-gene interactions, the implementation of a variety of single locus filters was completed and evaluated as a proof of concept. To streamline PLATO for efficient epistasis analysis, we determined which of 24 analytical filters produced redundant results. Using a kappa score to identify agreement between filters, we grouped the analytical filters into 4 filter classes; thus all further analyses employed four filters. We then tested the MAX statistic put forth by Sladek et al. (1) in simulated data exploring a number of genetic models of modest effect size. To find the MAX statistic, the four filters were run on each SNP in each dataset and the smallest p-value among the four results was taken as the final result. Permutation testing was performed to empirically determine the p-value. The power of the MAX statistic to detect each of the simulated effects was determined in addition to the Type 1 error and false positive rates. The results of this simulation study demonstrates that PLATO using the four filters incorporating the MAX statistic has higher power on average to find multiple types of effects and a lower false positive rate than any of the individual filters alone. In the future we will extend PLATO with the MAX statistic to interaction analyses for large-scale genomic datasets.
迄今为止,用于检测全基因组关联研究(GWAS)数据集中变异之间基因-基因相互作用的方法尚未得到充分发展。PLATO(大规模数据分析、转化与组织平台)是一种基于过滤的方法,它同时汇集了多种分析方法,旨在解决这一问题。PLATO将庞大的基因组数据集过滤到一个遗传变异子集,这可能对相互作用分析有用。作为将PLATO用于检测基因-基因相互作用的前期工作,完成并评估了多种单基因座过滤器的实施情况,作为概念验证。为了简化PLATO以进行高效的上位性分析,我们确定了24种分析过滤器中哪些会产生冗余结果。使用kappa分数来识别过滤器之间的一致性,我们将分析过滤器分为4个过滤器类别;因此,所有进一步的分析都采用了4种过滤器。然后,我们在模拟数据中测试了Sladek等人(1)提出的MAX统计量,该模拟数据探索了一些中等效应大小的遗传模型。为了找到MAX统计量,在每个数据集中对每个单核苷酸多态性(SNP)运行这4种过滤器,并将4个结果中最小的p值作为最终结果。进行置换检验以凭经验确定p值。除了I型错误和假阳性率外,还确定了MAX统计量检测每种模拟效应的功效。这项模拟研究的结果表明,使用包含MAX统计量的4种过滤器的PLATO平均而言比任何单个过滤器都具有更高的功效来发现多种类型的效应,并且假阳性率更低。未来,我们将把带有MAX统计量的PLATO扩展到大规模基因组数据集的相互作用分析中。