Shi Weiliang, Wahba Grace, Wright Stephen, Lee Kristine, Klein Ronald, Klein Barbara
Department of Statistics, University of Wisconsin, 1300 University Avenue, Madison WI 53706, E-mail address:
Stat Interface. 2008;1(1):137-153. doi: 10.4310/sii.2008.v1.n1.a12.
The LASSO-Patternsearch algorithm is proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. The method is designed for the case where there is a possibly very large number of candidate patterns but it is believed that only a relatively small number are important. A LASSO is used to greatly reduce the number of candidate patterns, using a novel computational algorithm that can handle an extremely large number of unknowns simultaneously. The patterns surviving the LASSO are further pruned in the framework of (parametric) generalized linear models. A novel tuning procedure based on the GACV for Bernoulli outcomes, modified to act as a model selector, is used at both steps. We applied the method to myopia data from the population-based Beaver Dam Eye Study, exposing physiologically interesting interacting risk factors. We then applied the the method to data from a generative model of Rheumatoid Arthritis based on Problem 3 from the Genetic Analysis Workshop 15, successfully demonstrating its potential to efficiently recover higher order patterns from attribute vectors of length typical of genomic studies.
提出了LASSO模式搜索算法,以有效地识别在人口统计学和基因组研究中与感兴趣的结果相关的多个二分风险因素的模式。所考虑的模式是那些从多元伯努利密度的对数线性展开中自然产生的模式。该方法适用于存在大量候选模式,但据信只有相对少数模式重要的情况。使用LASSO通过一种能够同时处理大量未知数的新颖计算算法来大幅减少候选模式的数量。在(参数化)广义线性模型框架内,对通过LASSO筛选出的模式进一步进行精简。在两个步骤中都使用了一种基于用于伯努利结果的GACV的新颖调整程序,该程序经过修改后用作模型选择器。我们将该方法应用于基于人群的比弗戴姆眼研究中的近视数据,揭示了具有生理意义的相互作用风险因素。然后,我们将该方法应用于基于遗传分析研讨会15的问题3生成的类风湿性关节炎模型的数据,成功证明了其从基因组研究中典型长度的属性向量有效恢复高阶模式的潜力。