Université Bordeaux, ISPED, Centre INSERM U897-Epidemiologie-Biostatistique, Bordeaux, France.
Epidemiology. 2012 Sep;23(5):706-12. doi: 10.1097/EDE.0b013e31825fa528.
Large data sets with many variables provide particular challenges when constructing analytic models. Lasso-related methods provide a useful tool, although one that remains unfamiliar to most epidemiologists.
We illustrate the application of lasso methods in an analysis of the impact of prescribed drugs on the risk of a road traffic crash, using a large French nationwide database (PLoS Med 2010;7:e1000366). In the original case-control study, the authors analyzed each exposure separately. We use the lasso method, which can simultaneously perform estimation and variable selection in a single model. We compare point estimates and confidence intervals using (1) a separate logistic regression model for each drug with a Bonferroni correction and (2) lasso shrinkage logistic regression analysis.
Shrinkage regression had little effect on (bias corrected) point estimates, but led to less conservative results, noticeably for drugs with moderate levels of exposure. Carbamates, carboxamide derivative and fatty acid derivative antiepileptics, drugs used in opioid dependence, and mineral supplements of potassium showed stronger associations.
Lasso is a relevant method in the analysis of databases with large number of exposures and can be recommended as an alternative to conventional strategies.
当构建分析模型时,大数据集和多个变量会带来特殊的挑战。套索相关方法提供了一种有用的工具,尽管大多数流行病学家对此并不熟悉。
我们使用一个大型的法国全国性数据库(PLoS Med 2010;7:e1000366)说明了套索方法在分析处方药对道路交通事故风险的影响中的应用。在原始病例对照研究中,作者分别分析了每种暴露的情况。我们使用套索方法,该方法可以在单个模型中同时进行估计和变量选择。我们使用(1)每个药物的单独逻辑回归模型和 Bonferroni 校正,以及(2)套索收缩逻辑回归分析来比较点估计和置信区间。
收缩回归对(偏差校正)点估计的影响很小,但导致结果不那么保守,特别是对于暴露水平中等的药物。氨基甲酸酯、羧酰胺衍生物和脂肪酸衍生物抗癫痫药、用于阿片类药物依赖的药物以及钾的矿物质补充剂显示出更强的相关性。
套索是分析具有大量暴露的数据库的一种相关方法,可以作为传统策略的替代方法推荐。