Schwender Holger, Bowers Katherine, Fallin M Daniele, Ruczinski Ingo
Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21218, USA.
Ann Hum Genet. 2011 Jan;75(1):122-32. doi: 10.1111/j.1469-1809.2010.00623.x. Epub 2010 Nov 30.
Ensemble methods (such as Bagging and Random Forests) take advantage of unstable base learners (such as decision trees) to improve predictions, and offer measures of variable importance useful for variable selection. LogicFS has been proposed as such an ensemble learner for case-control studies when interactions of single nucleotide polymorphisms (SNPs) are of particular interest. LogicFS uses bootstrap samples of the data and employs the Boolean trees derived in logic regression as base learners to create ensembles of models that allow for the quantification of the contributions of epistatic interactions to the disease risk. In this article, we propose an extension of logicFS suitable for case-parent trio data, and derive an additional importance measure that is much less influenced by linkage disequilibrium between SNPs than the measure originally used in logicFS. We illustrate the performance of the novel procedure in simulation studies and in a case study of 461 case-parent trios with autistic children.
集成方法(如Bagging和随机森林)利用不稳定的基学习器(如决策树)来改进预测,并提供对变量选择有用的变量重要性度量。当单核苷酸多态性(SNP)的相互作用特别受关注时,LogicFS已被提议作为病例对照研究的此类集成学习器。LogicFS使用数据的自助抽样,并将逻辑回归中导出的布尔树用作基学习器,以创建模型集成,从而能够量化上位性相互作用对疾病风险的贡献。在本文中,我们提出了一种适用于病例-父母三联体数据的LogicFS扩展,并推导了一种额外的重要性度量,该度量比LogicFS中最初使用的度量受SNP之间连锁不平衡的影响要小得多。我们在模拟研究和对461个患有自闭症儿童的病例-父母三联体的案例研究中展示了新方法的性能。