Lin Hui-Yi, Chen Y Ann, Tsai Ya-Yu, Qu Xiaotao, Tseng Tung-Sung, Park Jong Y
H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA.
Ann Hum Genet. 2012 Jan;76(1):53-62. doi: 10.1111/j.1469-1809.2011.00692.x. Epub 2011 Dec 11.
Studies have shown that interactions of single nucleotide polymorphisms (SNPs) may play an important role in understanding the causes of complex disease. We have proposed an integrated machine learning method that combines two machine-learning methods-Random Forests (RF) and Multivariate Adaptive Regression Splines (MARS)-to identify a subset of important SNPs and detect interaction patterns more effectively and efficiently. In this two-stage RF-MARS (TRM) approach, RF is first applied to detect a predictive subset of SNPs, and then MARS is used to identify the interaction patterns. We evaluated the TRM performances in four models. RF variable selection was based on out-of-bag classification error rate (OOB) and variable important spectrum (IS). Our results support that RF(OOB) had better performance than MARS and RF(IS) in detecting important variables. This study demonstrates that TRM(OOB) , which is RF(OOB) plus MARS, has combined the strengths of RF and MARS in identifying SNP-SNP interactions in a scenario of 100 candidate SNPs. TRM(OOB) had greater true positive rate and lower false positive rate compared with MARS, particularly for searching interactions with a strong association with the outcome. Therefore, the use of TRM(OOB) is favored for exploring SNP-SNP interactions in a large-scale genetic variation study.
研究表明,单核苷酸多态性(SNP)的相互作用可能在理解复杂疾病的病因中发挥重要作用。我们提出了一种集成机器学习方法,该方法结合了两种机器学习方法——随机森林(RF)和多元自适应回归样条(MARS),以更有效且高效地识别重要SNP的子集并检测相互作用模式。在这种两阶段的RF-MARS(TRM)方法中,首先应用RF来检测SNP的预测子集,然后使用MARS来识别相互作用模式。我们在四个模型中评估了TRM的性能。RF变量选择基于袋外分类错误率(OOB)和变量重要性谱(IS)。我们的结果支持RF(OOB)在检测重要变量方面比MARS和RF(IS)具有更好的性能。本研究表明,TRM(OOB),即RF(OOB)加MARS,在100个候选SNP的情况下结合了RF和MARS在识别SNP-SNP相互作用方面的优势。与MARS相比,TRM(OOB)具有更高的真阳性率和更低的假阳性率,特别是在搜索与结果有强关联的相互作用时。因此,在大规模遗传变异研究中探索SNP-SNP相互作用时,倾向于使用TRM(OOB)。