Suppr超能文献

利用合成电子健康记录设计用于糖尿病并发症预测的新特征。

Engineering novel features for diabetes complication prediction using synthetic electronic health records.

作者信息

Voskergian Daniel, Bakir-Gungor Burcu, Yousef Malik

机构信息

Computer Engineering Department, Al-Quds University, Jerusalem, Palestine.

Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Türkiye.

出版信息

Front Genet. 2025 Apr 14;16:1451290. doi: 10.3389/fgene.2025.1451290. eCollection 2025.

Abstract

Diabetes significantly affects millions of people worldwide, leading to substantial morbidity, disability, and mortality rates. Predicting diabetes-related complications from health records is crucial for early prevention and for the development of effective treatment plans. In order to predict four different complications of diabetes mellitus, i.e., retinopathy, chronic kidney disease, ischemic heart disease, and amputations, this study introduces a novel feature engineering approach. While developing the classification models, we utilize XGBoost feature selection method and various supervised machine learning algorithms, including Random Forest, XGBoost, LogitBoost, AdaBoost, and Decision Tree. These models were trained on synthetic electronic health records (EHR) generated by dual-adversarial autoencoders. These EHRs represent nearly 1 million synthetic patients derived from an authentic cohort of 979,308 individuals with diabetes. The variables considered in the models were the age range accompanied by chronic diseases that occur during patient visits starting from the onset of diabetes. Throughout the experiments, XGBoost and Random Forest demonstrated the best overall prediction performance. The final models, which are tailored to each complication and trained using our feature engineering approach, achieved an accuracy between 69% and 77% and an AUC between 77% and 84% using cross-validation, while the partitioned validation approach yielded an accuracy between 59% and 78% and an AUC between 66% and 85%. These findings imply that the performance of our method surpass the performance of the traditional Bag-of-Features approach, highlighting the effectiveness of our approach in enhancing model accuracy and robustness.

摘要

糖尿病严重影响着全球数百万人,导致了大量的发病率、残疾率和死亡率。从健康记录中预测糖尿病相关并发症对于早期预防和制定有效的治疗方案至关重要。为了预测糖尿病的四种不同并发症,即视网膜病变、慢性肾病、缺血性心脏病和截肢,本研究引入了一种新颖的特征工程方法。在开发分类模型时,我们使用了XGBoost特征选择方法以及各种监督机器学习算法,包括随机森林、XGBoost、逻辑回归增强、自适应增强和决策树。这些模型是在由双对抗自动编码器生成的合成电子健康记录(EHR)上进行训练的。这些EHR代表了从979308名糖尿病患者的真实队列中衍生出的近100万合成患者。模型中考虑的变量是从糖尿病发病开始,患者就诊期间伴随的慢性病的年龄范围。在整个实验中,XGBoost和随机森林表现出了最佳的整体预测性能。使用我们的特征工程方法针对每种并发症量身定制并训练的最终模型,通过交叉验证实现了69%至77%的准确率和77%至84%的曲线下面积(AUC),而划分验证方法的准确率在59%至78%之间,AUC在66%至85%之间。这些发现意味着我们方法的性能超过了传统的特征袋方法,突出了我们方法在提高模型准确性和鲁棒性方面的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dca/12041673/2e1f5ac6866a/fgene-16-1451290-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验