Suppr超能文献

一种基于生成对抗网络的数据平衡用于透析中低血压预测的整体框架。

A holistic framework for intradialytic hypotension prediction using generative adversarial networks-based data balancing.

作者信息

Lin Hsuan-Ming, Lyu JrJung

机构信息

Institute of Information Management, National Cheng Kung University, Tainan, Taiwan.

Internal Medicine, Nephrology Division, An Nan Hospital, China Medical University, Tainan, Taiwan.

出版信息

BMC Med Inform Decis Mak. 2025 Jul 10;25(1):257. doi: 10.1186/s12911-025-03094-5.

Abstract

BACKGROUND

Intradialytic Hypotension (IDH) is a frequent complication in hemodialysis, yet predictive modeling is challenged by class imbalance. Traditional oversampling methods often struggle with complex clinical data. This study evaluates an enhanced conditional Wasserstein Generative Adversarial Network with Gradient Penalty (CWGAN-GP) framework to improve IDH prediction by generating high-utility synthetic data for balancing.

METHODS

A CWGAN-GP was developed using multi-level hemodialysis data. Following rigorous preprocessing, including a strict temporal train-test split, the CWGAN-GP generated minority class samples exclusively on the training data. eXtreme Gradient Boosting (XGBoost) models were trained on the original imbalanced data and datasets balanced using the proposed CWGAN-GP method, benchmarked against traditional Synthetic Minority Over-sampling Technique(SMOTE) and Adaptive Synthetic Sampling Approach(ADASYN) balancing. Performance was evaluated using metrics sensitive to imbalance (e.g., Precision-Recall Area Under the Curve) and statistical comparisons, with SHapley Additive exPlanations (SHAP) analysis for interpretability.

RESULTS

The study population consisted of 40 chronic hemodialysis patients (45% male, mean age 66.30[Formula: see text] 10.68 years). An initial dataset, where intradialytic hypotension (IDH) events occurred in 14.85% of records (19,124 instances overall), was temporally split (75:25 ratio). This yielded an Original Training dataset of 95,856 samples (14.73% IDH rate) and a test set (15.21% IDH rate). From this Original Training dataset, a Generative Adversarial Network (GAN) was employed to construct a balanced dataset comprising 163,470 samples. The GAN Balanced dataset yielded the highest predictive performance, demonstrating statistically significant improvements over the Original Training dataset across metrics, including Precision-Recall Area Under the Curve (PR-AUC) (mean 0.735 vs 0.724) and Accuracy (mean 0.900 vs 0.892). In contrast, the GAN Augmented dataset (191,712 samples) showed mixed results (improved Accuracy/F1, decreased Receiver Operating Characteristic Curve Area Under Curve (ROC-AUC)/PR-AUC). In comparison, ADASYN (163,326 samples) and SMOTE (163,470 samples) balanced datasets significantly underperformed on PR-AUC. SHAP analysis identified Dialysis Date (as a proxy for temporal patterns like day-of-week) and hemodynamic indicators (e.g., Systolic Diastolic Difference, Previous Systolic Pressure) as key IDH predictors.

CONCLUSION

The proposed CWGAN-GP framework effectively balances complex hemodialysis data, leading to significantly improved and interpretable IDH prediction models compared to standard approaches. This work supports leveraging advanced generative models like GAN to overcome data imbalance in clinical prediction tasks, which is pending further validation.

摘要

背景

透析中低血压(IDH)是血液透析中常见的并发症,但预测模型受到类别不平衡的挑战。传统的过采样方法在处理复杂的临床数据时往往存在困难。本研究评估了一种带有梯度惩罚的增强条件瓦瑟斯坦生成对抗网络(CWGAN-GP)框架,通过生成高实用性的合成数据来平衡数据,以改善IDH预测。

方法

使用多级血液透析数据开发了CWGAN-GP。经过严格的预处理,包括严格的时间序列训练-测试分割,CWGAN-GP仅在训练数据上生成少数类样本。使用极端梯度提升(XGBoost)模型在原始不平衡数据和使用所提出的CWGAN-GP方法平衡后的数据集上进行训练,并与传统的合成少数过采样技术(SMOTE)和自适应合成采样方法(ADASYN)平衡方法进行基准比较。使用对不平衡敏感的指标(如精确召回率曲线下面积)和统计比较来评估性能,并使用SHapley值相加解释(SHAP)分析来进行可解释性分析。

结果

研究人群包括40例慢性血液透析患者(45%为男性,平均年龄66.30[公式:见原文]10.68岁)。初始数据集记录中14.85%发生透析中低血压(IDH)事件(共19124例),按时间序列进行分割(75:25比例)。这产生了一个包含95856个样本的原始训练数据集(IDH发生率为14.73%)和一个测试集(IDH发生率为15.21%)。从这个原始训练数据集中,使用生成对抗网络(GAN)构建了一个包含163470个样本的平衡数据集。GAN平衡数据集产生了最高的预测性能,在包括精确召回率曲线下面积(PR-AUC)(均值0.735对0.724)和准确率(均值0.900对0.892)等指标上,与原始训练数据集相比有统计学意义的显著改善。相比之下,GAN增强数据集(191712个样本)结果不一(准确率/F1提高,受试者操作特征曲线下面积(ROC-AUC)/PR-AUC降低)。相比之下,ADASYN(163326个样本)和SMOTE(163470个样本)平衡数据集在PR-AUC上显著表现不佳。SHAP分析确定透析日期(作为一周中某天等时间模式的代理)和血流动力学指标(如收缩压与舒张压差值、先前收缩压)为关键的IDH预测因素。

结论

所提出的CWGAN-GP框架有效地平衡了复杂的血液透析数据,与标准方法相比,显著改进了IDH预测模型并使其具有可解释性。这项工作支持利用像GAN这样的先进生成模型来克服临床预测任务中的数据不平衡问题,不过这有待进一步验证。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验