From the Team of Environmental Epidemiology, IAB, Institute for Advanced Biosciences, Inserm, CNRS, CHU-Grenoble-Alpes, University Grenoble-Alpes, Grenoble, France.
Epidemiology. 2021 May 1;32(3):402-411. doi: 10.1097/EDE.0000000000001340.
Machine-learning algorithms are increasingly used in epidemiology to identify true predictors of a health outcome when many potential predictors are measured. However, these algorithms can provide different outputs when repeatedly applied to the same dataset, which can compromise research reproducibility. We aimed to illustrate that commonly used algorithms are unstable and, using the example of Least Absolute Shrinkage and Selection Operator (LASSO), that stabilization method choice is crucial.
In a simulation study, we tested the stability and performance of widely used machine-learning algorithms (LASSO, Elastic-Net, and Deletion-Substitution-Addition [DSA]). We then assessed the effectiveness of six methods to stabilize LASSO and their impact on performance. We assumed that a linear combination of factors drawn from a simulated set of 173 quantitative variables assessed in 1,301 subjects influenced to varying extents a continuous health outcome. We assessed model stability, sensitivity, and false discovery proportion.
All tested algorithms were unstable. For LASSO, stabilization methods improved stability without ensuring perfect stability, a finding confirmed by application to an exposome study. Stabilization methods also affected performance. Specifically, stabilization based on hyperparameter optimization, frequently implemented in epidemiology, increased the false discovery proportion dramatically when predictors explained a low share of outcome variability. In contrast, stabilization based on stability selection procedure often decreased the false discovery proportion, while sometimes simultaneously lowering sensitivity.
Machine-learning methods instability should concern epidemiologists relying on them for variable selection, as stabilizing a model can impact its performance. For LASSO, stabilization methods based on stability selection procedure (rather than addressing prediction stability) should be preferred to identify true predictors.
当测量到许多潜在的预测因子时,机器学习算法越来越多地用于流行病学中,以识别对健康结果的真正预测因子。然而,当这些算法被反复应用于同一数据集时,它们可能会提供不同的输出,这可能会影响研究的可重复性。我们旨在说明常用的算法是不稳定的,并且使用最小绝对收缩和选择算子(LASSO)的示例说明,稳定方法的选择至关重要。
在一项模拟研究中,我们测试了广泛使用的机器学习算法(LASSO、弹性网络和删除-替换-添加[DSA])的稳定性和性能。然后,我们评估了六种稳定 LASSO 的方法及其对性能的影响。我们假设从模拟的 173 个定量变量集中抽取的因素的线性组合以不同的程度影响连续的健康结果。我们评估了模型的稳定性、敏感性和假阳性发现率。
所有测试的算法都是不稳定的。对于 LASSO,稳定化方法提高了稳定性,但不能确保完全稳定,这一发现通过应用于暴露组研究得到了证实。稳定化方法也会影响性能。具体而言,基于超参数优化的稳定化方法,在预测因子解释结果变异性的低份额时,极大地增加了假阳性发现率。相比之下,基于稳定性选择程序的稳定化方法通常会降低假阳性发现率,而有时同时降低敏感性。
依赖机器学习方法进行变量选择的流行病学家应该关注这些方法的不稳定性,因为稳定模型可能会影响其性能。对于 LASSO,应优先选择基于稳定性选择程序的稳定化方法(而不是解决预测稳定性问题)来识别真正的预测因子。