Department of Statistics and Finance, University of Science and Technology of China, Hefei, Anhui, China.
Bioinformatics. 2010 Aug 1;26(15):1871-8. doi: 10.1093/bioinformatics/btq290. Epub 2010 Jun 10.
The multifactor-dimensionality reduction (MDR) method has been widely used in multi-locus interaction analysis. It reduces dimensionality by partitioning the multi-locus genotypes into a high-risk group and a low-risk group according to whether the genotype-specific risk ratio exceeds a fixed threshold or not. Alternatively, one can maximize the chi(2) value exhaustively over all possible ways of partitioning the multi-locus genotypes into two groups, and we aim to show that this is computationally feasible.
We advocate finding the optimal MDR (OMDR) that would have resulted from an exhaustive search over all possible ways of partitioning the multi-locus genotypes into two groups. It is shown that this optimal MDR can be obtained efficiently using an ordered combinatorial partitioning (OCP) method, which differs from the existing MDR method in the use of a data-driven rather than fixed threshold. The generalized extreme value distribution (GEVD) theory is applied to find the optimal order of gene combination and assess statistical significance of interactions.
The computational complexity of OCP strategy is linear in the number of multi-locus genotypes in contrast with an exponential order for the naive exhaustive search strategy. Simulation studies show that OMDR can be more powerful than MDR with substantial power gain possible when the partitioning of OMDR is different from that of MDR. The analysis results of a breast cancer dataset show that the use of GEVD accelerates the determination of interaction order and reduces the time cost for P-value calculation by more than 10-fold.
C++ program is available at http://home.ustc.edu.cn/~zhanghan/ocp/ocp.html
多因子降维(MDR)方法已广泛应用于多基因座相互作用分析。它通过根据基因型特异性风险比是否超过固定阈值将多基因座基因型划分为高风险组和低风险组来降低维度。或者,可以通过穷尽所有可能的方法将多基因座基因型划分为两组来最大化卡方值,并旨在证明这在计算上是可行的。
我们主张找到最优 MDR(OMDR),这将是通过对所有可能的方法将多基因座基因型划分为两组进行穷尽搜索而得到的。结果表明,使用有序组合划分(OCP)方法可以有效地获得这种最优 MDR,该方法与现有 MDR 方法的不同之处在于使用数据驱动而不是固定阈值。广义极值分布(GEVD)理论用于找到最佳基因组合顺序并评估相互作用的统计显著性。
OCP 策略的计算复杂度与多基因座基因型的数量呈线性关系,而盲目穷举搜索策略的复杂度呈指数级。模拟研究表明,当 OMDR 的划分与 MDR 不同时,OMDR 可以比 MDR 更有效,并且可能获得实质性的功效增益。乳腺癌数据集的分析结果表明,使用 GEVD 可以加速相互作用顺序的确定,并将 P 值计算的时间成本降低 10 倍以上。
C++ 程序可在 http://home.ustc.edu.cn/~zhanghan/ocp/ocp.html 获得。