梯度提升树与个体解释：一种替代逻辑回归的方法，用于预测妊娠早期的存活能力。

Gradient boosted trees with individual explanations: An alternative to logistic regression for viability prediction in the first trimester of pregnancy.

机构信息

ESAT-STADIUS, Stadius Centre for Dynamical Systems, Signal Processing and Data Analytics (STADIUS), Leuven (Arenberg) Kasteelpark Arenberg 10 - box 2446, Leuven 3001, Belgium.

Tommy's National Early Miscarriage Research Centre, Queen Charlotte's and Chelsea Hospital, Imperial College, Du Cane Road, London W12 0HS, United Kingdom.

出版信息

Comput Methods Programs Biomed. 2022 Jan;213:106520. doi: 10.1016/j.cmpb.2021.106520. Epub 2021 Nov 10.

DOI:10.1016/j.cmpb.2021.106520

PMID:34808532

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8674730/

Abstract

BACKGROUND

Clinical models to predict first trimester viability are traditionally based on multivariable logistic regression (LR) which is not directly interpretable for non-statistical experts like physicians. Furthermore, LR requires complete datasets and pre-established variables specifications. In this study, we leveraged the internal non-linearity, feature selection and missing values handling mechanisms of machine learning algorithms, along with a post-hoc interpretability strategy, as potential advantages over LR for clinical modeling.

METHODS

The dataset included 1154 patients with 2377 individual scans and was obtained from a prospective observational cohort study conducted at a hospital in London, UK, from March 2014 to May 2019. The data were split into a training (70%) and a test set (30%). Parsimonious and complete multivariable models were developed from two algorithms to predict first trimester viability at 11-14 weeks gestational age (GA): LR and light gradient boosted machine (LGBM). Missing values were handled by multiple imputation where appropriate. The SHapley Additive exPlanations (SHAP) framework was applied to derive individual explanations of the models.

RESULTS

The parsimonious LGBM model had similar discriminative and calibration performance as the parsimonious LR (AUC 0.885 vs 0.860; calibration slope: 1.19 vs 1.18). The complete models did not outperform the parsimonious models. LGBM was robust to the presence of missing values and did not require multiple imputation unlike LR. Decision path plots and feature importance analysis revealed different algorithm behaviors despite similar predictive performance. The main driving variable from the LR model was the pre-specified interaction between fetal heart presence and mean sac diameter. The crown-rump length variable and a proxy variable reflecting the difference in GA between expected and observed GA were the two most important variables of LGBM. Finally, while variable interactions must be specified upfront with LR, several interactions were ranked by the SHAP framework among the most important features learned automatically by the LGBM algorithm.

CONCLUSIONS

Gradient boosted algorithms performed similarly to carefully crafted LR models in terms of discrimination and calibration for first trimester viability prediction. By handling multi-collinearity, missing values, feature selection and variable interactions internally, the gradient boosted trees algorithm, combined with SHAP, offers a serious alternative to traditional LR models.

摘要

背景

传统上，预测早孕期存活能力的临床模型是基于多变量逻辑回归（LR），这对于非统计专家（如医生）来说是不可直接解释的。此外，LR 需要完整的数据集和预先确定的变量规范。在这项研究中，我们利用机器学习算法的内部非线性、特征选择和缺失值处理机制，以及事后可解释性策略，作为临床建模中优于 LR 的潜在优势。

方法

该数据集包含了 1154 名患者的 2377 个个体扫描，是从 2014 年 3 月至 2019 年 5 月在英国伦敦的一家医院进行的前瞻性观察队列研究中获得的。数据被分为训练集（70%）和测试集（30%）。从两种算法（LR 和轻梯度提升机（LGBM））中开发了简洁和完整的多变量模型，以预测 11-14 周妊娠龄（GA）的早孕期存活能力：LR 和 LGBM。在适当的情况下，通过多次插补处理缺失值。应用 SHapley Additive exPlanations（SHAP）框架来得出模型的个体解释。

结果

简洁的 LGBM 模型与简洁的 LR 具有相似的判别和校准性能（AUC 为 0.885 vs 0.860；校准斜率为 1.19 vs 1.18）。完整模型并没有优于简洁模型。与 LR 不同，LGBM 对缺失值具有鲁棒性，并且不需要多次插补。决策路径图和特征重要性分析揭示了尽管预测性能相似，但算法行为却有所不同。LR 模型的主要驱动变量是胎儿心脏存在与平均囊直径之间的预先指定交互。头臀长变量和反映预期 GA 与观察 GA 之间差异的代理变量是 LGBM 中两个最重要的变量。最后，虽然必须预先指定 LR 中的变量交互，但 SHAP 框架会对几个交互进行排名，这些交互是由 LGBM 算法自动学习到的最重要特征之一。

结论

在预测早孕期存活能力方面，梯度提升算法在判别和校准方面与精心制作的 LR 模型表现相似。通过内部处理多共线性、缺失值、特征选择和变量交互，梯度提升树算法结合 SHAP，为传统的 LR 模型提供了一个可行的替代方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7925/8674730/40385b17ccc1/gr1.jpg

相似文献

Gradient boosted trees with individual explanations: An alternative to logistic regression for viability prediction in the first trimester of pregnancy.梯度提升树与个体解释：一种替代逻辑回归的方法，用于预测妊娠早期的存活能力。

Comput Methods Programs Biomed. 2022 Jan;213:106520. doi: 10.1016/j.cmpb.2021.106520. Epub 2021 Nov 10.

Predictive etiological classification of acute ischemic stroke through interpretable machine learning algorithms: a multicenter, prospective cohort study.通过可解释的机器学习算法对急性缺血性脑卒中进行预测病因分类：一项多中心前瞻性队列研究。

BMC Med Res Methodol. 2024 Sep 10;24(1):199. doi: 10.1186/s12874-024-02331-1.

Interpretable machine learning for allergic rhinitis prediction among preschool children in Urumqi, China.中国乌鲁木齐学龄前儿童变应性鼻炎预测的可解释机器学习。

Sci Rep. 2024 Sep 27;14(1):22281. doi: 10.1038/s41598-024-73733-w.

Using simple clinical and ultrasound variables to develop a model to predict first trimester pregnancy viability.利用简单的临床和超声变量建立预测早孕期妊娠结局的模型。

Eur J Obstet Gynecol Reprod Biol. 2024 Jan;292:187-193. doi: 10.1016/j.ejogrb.2023.11.030. Epub 2023 Nov 25.

Interpretable machine learning for predicting 28-day all-cause in-hospital mortality for hypertensive ischemic or hemorrhagic stroke patients in the ICU: a multi-center retrospective cohort study with internal and external cross-validation.用于预测重症监护病房中高血压性缺血性或出血性中风患者28天全因院内死亡率的可解释机器学习：一项具有内部和外部交叉验证的多中心回顾性队列研究

Front Neurol. 2023 Aug 8;14:1185447. doi: 10.3389/fneur.2023.1185447. eCollection 2023.

Prediction of Acute Kidney Injury after Extracorporeal Cardiac Surgery (CSA-AKI) by Machine Learning Algorithms.机器学习算法预测体外循环心脏手术后急性肾损伤（CSA-AKI）。

Heart Surg Forum. 2023 Oct 25;26(5):E537-E551. doi: 10.59958/hsf.5673.

Beyond black-box models: explainable AI for embryo ploidy prediction and patient-centric consultation.超越黑箱模型：用于胚胎倍性预测和以患者为中心咨询的可解释人工智能

J Assist Reprod Genet. 2024 Sep;41(9):2349-2358. doi: 10.1007/s10815-024-03178-7. Epub 2024 Jul 4.

Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者？

Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.

Predicting Colorectal Cancer Survival Using Time-to-Event Machine Learning: Retrospective Cohort Study.基于生存事件的机器学习预测结直肠癌患者生存情况：回顾性队列研究。

J Med Internet Res. 2023 Oct 26;25:e44417. doi: 10.2196/44417.

Comparative performance analysis of Boruta, SHAP, and Borutashap for disease diagnosis: A study with multiple machine learning algorithms.用于疾病诊断的Boruta、SHAP和Borutashap的比较性能分析：一项使用多种机器学习算法的研究。

Network. 2024 Mar 21:1-38. doi: 10.1080/0954898X.2024.2331506.

引用本文的文献

A data-driven framework for fair and efficient organ transplantation using gradient boosting and adaptive genetic allocation.一种使用梯度提升和自适应遗传分配的公平高效器官移植数据驱动框架。

J Artif Organs. 2025 Jun 6. doi: 10.1007/s10047-025-01512-z.

Predicting Leadership Status Through Trait Emotional Intelligence and Cognitive Ability.通过特质情绪智力和认知能力预测领导地位

Behav Sci (Basel). 2025 Mar 11;15(3):345. doi: 10.3390/bs15030345.

A review of evaluation approaches for explainable AI with applications in cardiology.用于可解释人工智能并应用于心脏病学的评估方法综述。

Artif Intell Rev. 2024;57(9):240. doi: 10.1007/s10462-024-10852-w. Epub 2024 Aug 9.

Risk prediction model based on machine learning for predicting miscarriage among pregnant patients with immune abnormalities.基于机器学习的风险预测模型，用于预测免疫异常的孕妇流产情况。

Front Pharmacol. 2024 Apr 22;15:1366529. doi: 10.3389/fphar.2024.1366529. eCollection 2024.

Predicting Postoperative Anterior Chamber Angle for Phakic Intraocular Lens Implantation Using Preoperative Anterior Segment Metrics.利用术前眼前节参数预测有晶状体眼人工晶状体植入术后前房角。

Transl Vis Sci Technol. 2023 Jan 3;12(1):10. doi: 10.1167/tvst.12.1.10.

本文引用的文献

Machine learning algorithms to predict early pregnancy loss after in vitro fertilization-embryo transfer with fetal heart rate as a strong predictor.以胎儿心率作为强预测指标，用于预测体外受精-胚胎移植后早期妊娠丢失的机器学习算法。

Comput Methods Programs Biomed. 2020 Nov;196:105624. doi: 10.1016/j.cmpb.2020.105624. Epub 2020 Jun 25.

From Local Explanations to Global Understanding with Explainable AI for Trees.利用可解释人工智能实现从局部解释到树木的全局理解

Nat Mach Intell. 2020 Jan;2(1):56-67. doi: 10.1038/s42256-019-0138-9. Epub 2020 Jan 17.

Logistic regression has similar performance to optimised machine learning algorithms in a clinical setting: application to the discrimination between type 1 and type 2 diabetes in young adults.在临床环境中，逻辑回归与优化的机器学习算法具有相似的性能：应用于区分年轻成年人的1型和2型糖尿病。

Diagn Progn Res. 2020 Jun 4;4:6. doi: 10.1186/s41512-020-00075-2. eCollection 2020.

Machine Learning (ML) based-method applied in recurrent pregnancy loss (RPL) patients diagnostic work-up: a potential innovation in common clinical practice.机器学习 (ML) 方法在复发性妊娠丢失 (RPL) 患者诊断中的应用：一种常见临床实践中的潜在创新。

Sci Rep. 2020 May 14;10(1):7970. doi: 10.1038/s41598-020-64512-4.

Peri-implantation urinary hormone monitoring distinguishes between types of first-trimester spontaneous pregnancy loss.种植窗期尿液激素监测可区分早孕期自发性流产的类型。

Paediatr Perinat Epidemiol. 2020 Sep;34(5):495-503. doi: 10.1111/ppe.12613. Epub 2020 Feb 13.

Early pregnancy ultrasound measurements and prediction of first trimester pregnancy loss: A logistic model.早孕超声测量与早期妊娠丢失的预测：逻辑模型。

Sci Rep. 2020 Jan 31;10(1):1545. doi: 10.1038/s41598-020-58114-3.

Posttraumatic stress, anxiety and depression following miscarriage and ectopic pregnancy: a multicenter, prospective, cohort study.流产和异位妊娠后创伤后应激、焦虑和抑郁：一项多中心、前瞻性队列研究。

Am J Obstet Gynecol. 2020 Apr;222(4):367.e1-367.e22. doi: 10.1016/j.ajog.2019.10.102. Epub 2019 Dec 13.

Improved protein structure prediction using potentials from deep learning.利用深度学习势进行蛋白质结构预测的改进。

Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan 15.

Towards trustable machine learning.迈向可信的机器学习。

Nat Biomed Eng. 2018 Oct;2(10):709-710. doi: 10.1038/s41551-018-0315-x.

Role of maternal age and pregnancy history in risk of miscarriage: prospective register based study.母亲年龄和妊娠史与流产风险的关系：前瞻性基于登记的研究。

BMJ. 2019 Mar 20;364:l869. doi: 10.1136/bmj.l869.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

梯度提升树与个体解释：一种替代逻辑回归的方法，用于预测妊娠早期的存活能力。

Gradient boosted trees with individual explanations: An alternative to logistic regression for viability prediction in the first trimester of pregnancy.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献