Suppr超能文献

梯度提升树与个体解释:一种替代逻辑回归的方法,用于预测妊娠早期的存活能力。

Gradient boosted trees with individual explanations: An alternative to logistic regression for viability prediction in the first trimester of pregnancy.

机构信息

ESAT-STADIUS, Stadius Centre for Dynamical Systems, Signal Processing and Data Analytics (STADIUS), Leuven (Arenberg) Kasteelpark Arenberg 10 - box 2446, Leuven 3001, Belgium.

Tommy's National Early Miscarriage Research Centre, Queen Charlotte's and Chelsea Hospital, Imperial College, Du Cane Road, London W12 0HS, United Kingdom.

出版信息

Comput Methods Programs Biomed. 2022 Jan;213:106520. doi: 10.1016/j.cmpb.2021.106520. Epub 2021 Nov 10.

Abstract

BACKGROUND

Clinical models to predict first trimester viability are traditionally based on multivariable logistic regression (LR) which is not directly interpretable for non-statistical experts like physicians. Furthermore, LR requires complete datasets and pre-established variables specifications. In this study, we leveraged the internal non-linearity, feature selection and missing values handling mechanisms of machine learning algorithms, along with a post-hoc interpretability strategy, as potential advantages over LR for clinical modeling.

METHODS

The dataset included 1154 patients with 2377 individual scans and was obtained from a prospective observational cohort study conducted at a hospital in London, UK, from March 2014 to May 2019. The data were split into a training (70%) and a test set (30%). Parsimonious and complete multivariable models were developed from two algorithms to predict first trimester viability at 11-14 weeks gestational age (GA): LR and light gradient boosted machine (LGBM). Missing values were handled by multiple imputation where appropriate. The SHapley Additive exPlanations (SHAP) framework was applied to derive individual explanations of the models.

RESULTS

The parsimonious LGBM model had similar discriminative and calibration performance as the parsimonious LR (AUC 0.885 vs 0.860; calibration slope: 1.19 vs 1.18). The complete models did not outperform the parsimonious models. LGBM was robust to the presence of missing values and did not require multiple imputation unlike LR. Decision path plots and feature importance analysis revealed different algorithm behaviors despite similar predictive performance. The main driving variable from the LR model was the pre-specified interaction between fetal heart presence and mean sac diameter. The crown-rump length variable and a proxy variable reflecting the difference in GA between expected and observed GA were the two most important variables of LGBM. Finally, while variable interactions must be specified upfront with LR, several interactions were ranked by the SHAP framework among the most important features learned automatically by the LGBM algorithm.

CONCLUSIONS

Gradient boosted algorithms performed similarly to carefully crafted LR models in terms of discrimination and calibration for first trimester viability prediction. By handling multi-collinearity, missing values, feature selection and variable interactions internally, the gradient boosted trees algorithm, combined with SHAP, offers a serious alternative to traditional LR models.

摘要

背景

传统上,预测早孕期存活能力的临床模型是基于多变量逻辑回归(LR),这对于非统计专家(如医生)来说是不可直接解释的。此外,LR 需要完整的数据集和预先确定的变量规范。在这项研究中,我们利用机器学习算法的内部非线性、特征选择和缺失值处理机制,以及事后可解释性策略,作为临床建模中优于 LR 的潜在优势。

方法

该数据集包含了 1154 名患者的 2377 个个体扫描,是从 2014 年 3 月至 2019 年 5 月在英国伦敦的一家医院进行的前瞻性观察队列研究中获得的。数据被分为训练集(70%)和测试集(30%)。从两种算法(LR 和轻梯度提升机(LGBM))中开发了简洁和完整的多变量模型,以预测 11-14 周妊娠龄(GA)的早孕期存活能力:LR 和 LGBM。在适当的情况下,通过多次插补处理缺失值。应用 SHapley Additive exPlanations(SHAP)框架来得出模型的个体解释。

结果

简洁的 LGBM 模型与简洁的 LR 具有相似的判别和校准性能(AUC 为 0.885 vs 0.860;校准斜率为 1.19 vs 1.18)。完整模型并没有优于简洁模型。与 LR 不同,LGBM 对缺失值具有鲁棒性,并且不需要多次插补。决策路径图和特征重要性分析揭示了尽管预测性能相似,但算法行为却有所不同。LR 模型的主要驱动变量是胎儿心脏存在与平均囊直径之间的预先指定交互。头臀长变量和反映预期 GA 与观察 GA 之间差异的代理变量是 LGBM 中两个最重要的变量。最后,虽然必须预先指定 LR 中的变量交互,但 SHAP 框架会对几个交互进行排名,这些交互是由 LGBM 算法自动学习到的最重要特征之一。

结论

在预测早孕期存活能力方面,梯度提升算法在判别和校准方面与精心制作的 LR 模型表现相似。通过内部处理多共线性、缺失值、特征选择和变量交互,梯度提升树算法结合 SHAP,为传统的 LR 模型提供了一个可行的替代方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7925/8674730/40385b17ccc1/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验