Suppr超能文献

基于电子病历数据的高血压和高血脂预测可解释模型(DHDIP)

DHDIP: An interpretable model for hypertension and hyperlipidemia prediction based on EMR data.

机构信息

College of Big Data Statistics, Guizhou University of Finance and Economics, Guiyang 550025, PR China; College of Statistics and Data Science, Xinjiang University of Finance and Economics, Urumqi 830012, PR China.

College of Statistics and Data Science, Xinjiang University of Finance and Economics, Urumqi 830012, PR China.

出版信息

Comput Methods Programs Biomed. 2022 Nov;226:107088. doi: 10.1016/j.cmpb.2022.107088. Epub 2022 Aug 28.

Abstract

BACKGROUND AND OBJECTIVE

Traditional hypertension and hyperlipidemia prediction models suffer from uneven modeling data sources, small sample sizes, and a lack of uniform standards for the index system, resulting in the model failing to fulfill clinical applications. To address this issue, this work will offer DHDIP, an interpretable hypertension and hyperlipidemia prediction model based on EMR data.

METHODS

First, we will select massive high-dimensional, unstructured EMR data as a unified modeling data source, and propose a pre-processing algorithm for EMR data to solve the problem that EMR data cannot be directly processed by machine learning algorithms. Second, a variety of mainstream models such as XGBoost, CatBoost, and RandomForest are selected for modeling, and the best adaptation algorithms are identified by performance comparison. Finally, the SHAP framework was introduced into the DHDIP model, thus identifying the main factors contributing to hypertension and hyperlipidemia, effectively enhancing the interpretability of the model.

RESULTS

The DHDIP model's MSE value is 0.0285, and its LOSS value is 0.0054, both of which are better than previous studies.

CONCLUSION

The model balances performance and interpretability. Multi-objective learning allows for a more thorough analysis and prediction of the condition, which not only lowers the cost of disease prediction but also aids physicians in clinical diagnosis. In addition, the datasets and source code are available from this link: https://github.com/Xiaoyao-Jia/DHDIP.

摘要

背景与目的

传统的高血压和高血脂预测模型存在建模数据源不均衡、样本量小以及指标体系缺乏统一标准等问题,导致模型无法满足临床应用的需求。针对这一问题,本研究提出了基于电子病历数据的可解释高血压和高血脂预测模型 DHDIP。

方法

首先,我们将选择海量的高维、非结构化的电子病历数据作为统一的建模数据源,并提出一种电子病历数据的预处理算法,以解决电子病历数据无法直接被机器学习算法处理的问题。其次,我们选择了 XGBoost、CatBoost 和 RandomForest 等多种主流模型进行建模,并通过性能比较确定最佳的适应算法。最后,我们将 SHAP 框架引入 DHDIP 模型中,从而识别出导致高血压和高血脂的主要因素,有效提高了模型的可解释性。

结果

DHDIP 模型的 MSE 值为 0.0285,LOSS 值为 0.0054,均优于以往的研究。

结论

该模型在性能和可解释性之间取得了平衡。多目标学习可以更彻底地分析和预测病情,不仅降低了疾病预测的成本,还可以帮助医生进行临床诊断。此外,数据集和源代码可以从以下链接获取:https://github.com/Xiaoyao-Jia/DHDIP。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验