打开黑箱：代谢综合征预测因子发现的可解释机器学习。

Opening the black box: interpretable machine learning for predictor finding of metabolic syndrome.

机构信息

Department of Epidemiology and Health Statistics, College of Public Health, Xinjiang Medical University, Urumqi, Xinjiang, China.

出版信息

BMC Endocr Disord. 2022 Aug 26;22(1):214. doi: 10.1186/s12902-022-01121-4.

DOI:10.1186/s12902-022-01121-4

PMID:36028865

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9419421/

Abstract

OBJECTIVE

The internal workings ofmachine learning algorithms are complex and considered as low-interpretation "black box" models, making it difficult for domain experts to understand and trust these complex models. The study uses metabolic syndrome (MetS) as the entry point to analyze and evaluate the application value of model interpretability methods in dealing with difficult interpretation of predictive models.

METHODS

The study collects data from a chain of health examination institution in Urumqi from 2017 ~ 2019, and performs 39,134 remaining data after preprocessing such as deletion and filling. RFE is used for feature selection to reduce redundancy; MetS risk prediction models (logistic, random forest, XGBoost) are built based on a feature subset, and accuracy, sensitivity, specificity, Youden index, and AUROC value are used to evaluate the model classification performance; post-hoc model-agnostic interpretation methods (variable importance, LIME) are used to interpret the results of the predictive model.

RESULTS

Eighteen physical examination indicators are screened out by RFE, which can effectively solve the problem of physical examination data redundancy. Random forest and XGBoost models have higher accuracy, sensitivity, specificity, Youden index, and AUROC values compared with logistic regression. XGBoost models have higher sensitivity, Youden index, and AUROC values compared with random forest. The study uses variable importance, LIME and PDP for global and local interpretation of the optimal MetS risk prediction model (XGBoost), and different interpretation methods have different insights into the interpretation of model results, which are more flexible in model selection and can visualize the process and reasons for the model to make decisions. The interpretable risk prediction model in this study can help to identify risk factors associated with MetS, and the results showed that in addition to the traditional risk factors such as overweight and obesity, hyperglycemia, hypertension, and dyslipidemia, MetS was also associated with other factors, including age, creatinine, uric acid, and alkaline phosphatase.

CONCLUSION

The model interpretability methods are applied to the black box model, which can not only realize the flexibility of model application, but also make up for the uninterpretable defects of the model. Model interpretability methods can be used as a novel means of identifying variables that are more likely to be good predictors.

摘要

目的

机器学习算法的内部运作复杂，被认为是低解释能力的“黑盒”模型，这使得领域专家难以理解和信任这些复杂模型。本研究以代谢综合征（MetS）为切入点，分析和评估模型可解释性方法在处理预测模型难以解释方面的应用价值。

方法

本研究从乌鲁木齐市某连锁体检机构收集 2017 年至 2019 年的数据，经过删除和填补等预处理后，剩余 39134 条数据。使用 RFE 进行特征选择以减少冗余；基于特征子集构建 MetS 风险预测模型（逻辑回归、随机森林、XGBoost），并使用准确性、敏感性、特异性、约登指数和 AUROC 值评估模型分类性能；使用事后模型不可知解释方法（变量重要性、LIME）对预测模型的结果进行解释。

结果

RFE 筛选出 18 项体检指标，可以有效解决体检数据冗余问题。与逻辑回归相比，随机森林和 XGBoost 模型具有更高的准确性、敏感性、特异性、约登指数和 AUROC 值。与随机森林相比，XGBoost 模型具有更高的敏感性、约登指数和 AUROC 值。本研究使用变量重要性、LIME 和 PDP 对最优 MetS 风险预测模型（XGBoost）进行全局和局部解释，不同的解释方法对模型结果的解释有不同的见解，在模型选择方面更具灵活性，可以可视化模型决策的过程和原因。本研究中的可解释风险预测模型有助于识别与 MetS 相关的风险因素，结果表明，除了超重和肥胖、高血糖、高血压和血脂异常等传统危险因素外，MetS 还与年龄、肌酐、尿酸和碱性磷酸酶等其他因素有关。

结论

模型可解释性方法应用于黑盒模型，不仅可以实现模型应用的灵活性，还可以弥补模型不可解释的缺陷。模型可解释性方法可以作为识别更有可能成为良好预测因子的变量的新手段。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5318/9419421/47c186a5eee6/12902_2022_1121_Fig1_HTML.jpg

相似文献

Opening the black box: interpretable machine learning for predictor finding of metabolic syndrome.打开黑箱：代谢综合征预测因子发现的可解释机器学习。

BMC Endocr Disord. 2022 Aug 26;22(1):214. doi: 10.1186/s12902-022-01121-4.

Prediction Model of Osteonecrosis of the Femoral Head After Femoral Neck Fracture: Machine Learning-Based Development and Validation Study.股骨颈骨折后股骨头坏死的预测模型：基于机器学习的开发与验证研究

JMIR Med Inform. 2021 Nov 19;9(11):e30079. doi: 10.2196/30079.

Prediction of metabolic and pre-metabolic syndromes using machine learning models with anthropometric, lifestyle, and biochemical factors from a middle-aged population in Korea.使用来自韩国中年人群的人体测量学、生活方式和生化因素的机器学习模型预测代谢和前代谢综合征。

BMC Public Health. 2022 Apr 6;22(1):664. doi: 10.1186/s12889-022-13131-x.

Machine learning-enabled prediction of prolonged length of stay in hospital after surgery for tuberculosis spondylitis patients with unbalanced data: a novel approach using explainable artificial intelligence (XAI).机器学习在数据不平衡的情况下预测脊柱结核手术后住院时间延长的预测：一种使用可解释人工智能 (XAI) 的新方法。

Eur J Med Res. 2024 Jul 25;29(1):383. doi: 10.1186/s40001-024-01988-0.

Construction of Xinjiang metabolic syndrome risk prediction model based on interpretable models.基于可解释模型的新疆代谢综合征风险预测模型构建。

BMC Public Health. 2022 Feb 8;22(1):251. doi: 10.1186/s12889-022-12617-y.

IHCP: interpretable hepatitis C prediction system based on black-box machine learning models.IHCP：基于黑盒机器学习模型的可解释丙型肝炎预测系统。

BMC Bioinformatics. 2023 Sep 6;24(1):333. doi: 10.1186/s12859-023-05456-0.

Interpretable machine learning models for hospital readmission prediction: a two-step extracted regression tree approach.可解释的机器学习模型在医院再入院预测中的应用：一种两步提取回归树方法。

BMC Med Inform Decis Mak. 2023 Jun 5;23(1):104. doi: 10.1186/s12911-023-02193-5.

Development of prediction models for one-year brain tumour survival using machine learning: a comparison of accuracy and interpretability.使用机器学习开发脑肿瘤一年生存率预测模型：准确性与可解释性的比较

Comput Methods Programs Biomed. 2023 May;233:107482. doi: 10.1016/j.cmpb.2023.107482. Epub 2023 Mar 13.

Risk prediction model of metabolic syndrome in perimenopausal women based on machine learning.基于机器学习的围绝经期女性代谢综合征风险预测模型。

Int J Med Inform. 2024 Aug;188:105480. doi: 10.1016/j.ijmedinf.2024.105480. Epub 2024 May 9.

Development and Interpretation of Multiple Machine Learning Models for Predicting Postoperative Delayed Remission of Acromegaly Patients During Long-Term Follow-Up.开发和解释多种机器学习模型，以预测长期随访中肢端肥大症患者术后延迟缓解的情况。

Front Endocrinol (Lausanne). 2020 Sep 16;11:643. doi: 10.3389/fendo.2020.00643. eCollection 2020.

引用本文的文献

Complex methods for complex data: key considerations for interpretable and actionable results in exposome research.复杂数据的复杂方法：暴露组研究中可解释且可操作结果的关键考量因素

Eur J Epidemiol. 2025 Aug 6. doi: 10.1007/s10654-025-01281-2.

Development and validation of an interpretable machine learning model for predicting Gleason score upgrade in prostate cancer.用于预测前列腺癌Gleason评分升级的可解释机器学习模型的开发与验证

Transl Androl Urol. 2025 Jun 30;14(6):1631-1644. doi: 10.21037/tau-2025-178. Epub 2025 Jun 26.

A predictive model for hospital death in cancer patients with acute pulmonary embolism using XGBoost machine learning and SHAP interpretation.一种使用XGBoost机器学习和SHAP解释法预测癌症合并急性肺栓塞患者院内死亡的模型

Sci Rep. 2025 May 25;15(1):18268. doi: 10.1038/s41598-025-02072-1.

Towards the prediction of drug solubility in binary solvent mixtures at various temperatures using machine learning.利用机器学习预测不同温度下药物在二元溶剂混合物中的溶解度

J Cheminform. 2024 Oct 28;16(1):117. doi: 10.1186/s13321-024-00911-3.

Detecting Alzheimer's Disease Stages and Frontotemporal Dementia in Time Courses of Resting-State fMRI Data Using a Machine Learning Approach.使用机器学习方法在静息态功能磁共振成像数据的时间进程中检测阿尔茨海默病阶段和额颞叶痴呆

J Imaging Inform Med. 2024 Dec;37(6):2768-2783. doi: 10.1007/s10278-024-01101-1. Epub 2024 May 23.

Survival Prediction Model for Patients with Hepatocellular Carcinoma and Extrahepatic Metastasis Based on XGBoost Algorithm.基于XGBoost算法的肝细胞癌合并肝外转移患者生存预测模型

J Hepatocell Carcinoma. 2023 Dec 13;10:2251-2263. doi: 10.2147/JHC.S429903. eCollection 2023.

Construction of a Diagnostic Model for Small Cell Lung Cancer Combining Metabolomics and Integrated Machine Learning.基于代谢组学和集成机器学习构建小细胞肺癌诊断模型。

Oncologist. 2024 Mar 4;29(3):e392-e401. doi: 10.1093/oncolo/oyad261.

Machine Learning Approach for Metabolic Syndrome Diagnosis Using Explainable Data-Augmentation-Based Classification.基于可解释数据增强分类的代谢综合征诊断机器学习方法

Diagnostics (Basel). 2022 Dec 10;12(12):3117. doi: 10.3390/diagnostics12123117.

本文引用的文献

Opening the Black Box: The Promise and Limitations of Explainable Machine Learning in Cardiology.揭开黑箱：可解释机器学习在心脏病学中的前景与局限。

Can J Cardiol. 2022 Feb;38(2):204-213. doi: 10.1016/j.cjca.2021.09.004. Epub 2021 Sep 14.

All Models are Wrong, but are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.所有模型都是有缺陷的，但都是有用的：通过同时研究一整个类别的预测模型来了解变量的重要性。

J Mach Learn Res. 2019;20.

SMILE: systems metabolomics using interpretable learning and evolution.SMILE：基于可解释学习和进化的系统代谢组学。

BMC Bioinformatics. 2021 May 28;22(1):284. doi: 10.1186/s12859-021-04209-1.

Explainable AI: A Review of Machine Learning Interpretability Methods.可解释人工智能：机器学习可解释性方法综述

Entropy (Basel). 2020 Dec 25;23(1):18. doi: 10.3390/e23010018.

Front Endocrinol (Lausanne). 2020 Sep 16;11:643. doi: 10.3389/fendo.2020.00643. eCollection 2020.

Interpretability of Input Representations for Gait Classification in Patients after Total Hip Arthroplasty.全髋关节置换术后步态分类中输入表示的可解释性。

Sensors (Basel). 2020 Aug 6;20(16):4385. doi: 10.3390/s20164385.

Predicting dengue importation into Europe, using machine learning and model-agnostic methods.利用机器学习和与模型无关的方法预测登革热传入欧洲。

Sci Rep. 2020 Jun 16;10(1):9689. doi: 10.1038/s41598-020-66650-1.

Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults.机器学习与衰老：以老年人严重跌倒伤害预测模型的开发为例。

J Gerontol A Biol Sci Med Sci. 2021 Mar 31;76(4):647-654. doi: 10.1093/gerona/glaa138.

Using a machine learning approach to predict mortality in critically ill influenza patients: a cross-sectional retrospective multicentre study in Taiwan.运用机器学习方法预测危重症流感患者的死亡率：台湾一项跨中心回顾性研究

BMJ Open. 2020 Feb 25;10(2):e033898. doi: 10.1136/bmjopen-2019-033898.

Highly precise risk prediction model for new-onset hypertension using artificial intelligence techniques.利用人工智能技术构建高精度的新发高血压风险预测模型。

J Clin Hypertens (Greenwich). 2020 Mar;22(3):445-450. doi: 10.1111/jch.13759. Epub 2019 Dec 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

打开黑箱：代谢综合征预测因子发现的可解释机器学习。

Opening the black box: interpretable machine learning for predictor finding of metabolic syndrome.

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献