York T P, Eaves L J
Department of Human Genetics, Medical College of Virginia, Virginia Commonwealth University, Richmond, Virginia, USA.
Genet Epidemiol. 2001;21 Suppl 1:S649-54. doi: 10.1002/gepi.2001.21.s1.s649.
A newly developed modern analytic approach, Multivariate Adaptive Regression Splines (MARS), was used to identify both genetic and non-genetic factors involved in the etiology of a common disease. We tested this method on the simulated data provided by the Genetic Analysis Workshop (GAW) 12 in problem 2 for the isolated population. MARS simultaneously analyzes all inputs, in this case DNA sequence variants and non-genetic data, and selectively prunes away variables contributing insignificantly to fit by internal cross-validation to arrive at a generalizable predictive model of the response. The relevant factors identified, by means of an importance value computed by MARS, were assumed to be associated with risk to the disease. The application of a series of subsequent models identified the quantitative traits and a single major gene contributing directly to risk liability using five sets of 7,000 individuals.
一种新开发的现代分析方法——多元自适应回归样条法(MARS),被用于识别一种常见疾病病因中涉及的遗传和非遗传因素。我们在遗传分析研讨会(GAW)12问题2中为隔离人群提供的模拟数据上测试了该方法。MARS同时分析所有输入数据,在这种情况下是DNA序列变异和非遗传数据,并通过内部交叉验证选择性地剔除对拟合贡献不大的变量,从而得出一个可推广的反应预测模型。通过MARS计算的重要性值确定的相关因素被假定与该疾病的风险相关。一系列后续模型的应用使用五组每组7000人的样本,识别出了直接导致患病风险的数量性状和一个主要基因。