Department of Medical Informatics, Erasmus University Medical Center, 3015 GD Rotterdam, The Netherlands.
Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095-1772, United States.
J Am Med Inform Assoc. 2024 Jun 20;31(7):1514-1521. doi: 10.1093/jamia/ocae109.
This study evaluates regularization variants in logistic regression (L1, L2, ElasticNet, Adaptive L1, Adaptive ElasticNet, Broken adaptive ridge [BAR], and Iterative hard thresholding [IHT]) for discrimination and calibration performance, focusing on both internal and external validation.
We use data from 5 US claims and electronic health record databases and develop models for various outcomes in a major depressive disorder patient population. We externally validate all models in the other databases. We use a train-test split of 75%/25% and evaluate performance with discrimination and calibration. Statistical analysis for difference in performance uses Friedman's test and critical difference diagrams.
Of the 840 models we develop, L1 and ElasticNet emerge as superior in both internal and external discrimination, with a notable AUC difference. BAR and IHT show the best internal calibration, without a clear external calibration leader. ElasticNet typically has larger model sizes than L1. Methods like IHT and BAR, while slightly less discriminative, significantly reduce model complexity.
L1 and ElasticNet offer the best discriminative performance in logistic regression for healthcare predictions, maintaining robustness across validations. For simpler, more interpretable models, L0-based methods (IHT and BAR) are advantageous, providing greater parsimony and calibration with fewer features. This study aids in selecting suitable regularization techniques for healthcare prediction models, balancing performance, complexity, and interpretability.
本研究评估了逻辑回归(L1、L2、ElasticNet、自适应 L1、自适应 ElasticNet、断裂自适应岭(BAR)和迭代硬阈值(IHT))中的正则化变体在判别和校准性能方面的表现,重点关注内部和外部验证。
我们使用来自 5 个美国索赔和电子健康记录数据库的数据,并为主要抑郁症患者群体的各种结果开发模型。我们在其他数据库中对所有模型进行外部验证。我们使用 75%/25%的训练-测试分割,并使用判别和校准来评估性能。性能差异的统计分析使用 Friedman 检验和临界差异图。
在所开发的 840 个模型中,L1 和 ElasticNet 在内部和外部判别方面均表现出色,AUC 差异显著。BAR 和 IHT 显示出最佳的内部校准,而外部校准没有明显的领先者。ElasticNet 通常比 L1 具有更大的模型大小。像 IHT 和 BAR 这样的方法虽然判别能力稍差,但可以显著降低模型复杂度。
L1 和 ElasticNet 在医疗保健预测的逻辑回归中提供了最佳的判别性能,在各种验证中保持稳健。对于更简单、更具可解释性的模型,基于 L0 的方法(IHT 和 BAR)具有优势,可提供更少的特征和更简洁的校准。本研究有助于选择适合医疗保健预测模型的正则化技术,在性能、复杂性和可解释性之间取得平衡。