Park Mee Young, Hastie Trevor
Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA.
Biostatistics. 2008 Jan;9(1):30-50. doi: 10.1093/biostatistics/kxm010. Epub 2007 Apr 11.
We propose using a variant of logistic regression (LR) with (L)(2)-regularization to fit gene-gene and gene-environment interaction models. Studies have shown that many common diseases are influenced by interaction of certain genes. LR models with quadratic penalization not only correctly characterizes the influential genes along with their interaction structures but also yields additional benefits in handling high-dimensional, discrete factors with a binary response. We illustrate the advantages of using an (L)(2)-regularization scheme and compare its performance with that of "multifactor dimensionality reduction" and "FlexTree," 2 recent tools for identifying gene-gene interactions. Through simulated and real data sets, we demonstrate that our method outperforms other methods in the identification of the interaction structures as well as prediction accuracy. In addition, we validate the significance of the factors selected through bootstrap analyses.
我们建议使用带有L2正则化的逻辑回归(LR)变体来拟合基因-基因和基因-环境相互作用模型。研究表明,许多常见疾病受到某些基因相互作用的影响。具有二次惩罚的LR模型不仅能正确表征有影响的基因及其相互作用结构,还能在处理具有二元响应的高维离散因素时带来额外的好处。我们阐述了使用L2正则化方案的优势,并将其性能与“多因素降维”和“FlexTree”这两种最近用于识别基因-基因相互作用的工具进行比较。通过模拟数据集和真实数据集,我们证明我们的方法在识别相互作用结构以及预测准确性方面优于其他方法。此外,我们通过自助法分析验证了所选因素的显著性。