Suppr超能文献

基于 XGBoost 的吸烟相关非传染性疾病预测框架。

XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction.

机构信息

Database and Bioinformatics Laboratory, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Korea.

Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh 700000, Vietnam.

出版信息

Int J Environ Res Public Health. 2020 Sep 7;17(18):6513. doi: 10.3390/ijerph17186513.

Abstract

Smoking-induced noncommunicable diseases (SiNCDs) have become a significant threat to public health and cause of death globally. In the last decade, numerous studies have been proposed using artificial intelligence techniques to predict the risk of developing SiNCDs. However, determining the most significant features and developing interpretable models are rather challenging in such systems. In this study, we propose an efficient extreme gradient boosting (XGBoost) based framework incorporated with the hybrid feature selection (HFS) method for SiNCDs prediction among the general population in South Korea and the United States. Initially, HFS is performed in three stages: (I) significant features are selected by t-test and chi-square test; (II) multicollinearity analysis serves to obtain dissimilar features; (III) final selection of best representative features is done based on least absolute shrinkage and selection operator (LASSO). Then, selected features are fed into the XGBoost predictive model. The experimental results show that our proposed model outperforms several existing baseline models. In addition, the proposed model also provides important features in order to enhance the interpretability of the SiNCDs prediction model. Consequently, the XGBoost based framework is expected to contribute for early diagnosis and prevention of the SiNCDs in public health concerns.

摘要

吸烟引起的非传染性疾病(SiNCDs)已成为全球公共卫生的重大威胁和主要死因。在过去十年中,已经提出了许多使用人工智能技术来预测 SiNCDs 发病风险的研究。然而,在这些系统中确定最重要的特征并开发可解释的模型是相当具有挑战性的。在这项研究中,我们提出了一种有效的基于极端梯度提升(XGBoost)的框架,并结合混合特征选择(HFS)方法,用于预测韩国和美国一般人群中的 SiNCDs。首先,HFS 分三个阶段进行:(I)通过 t 检验和卡方检验选择显著特征;(II)进行多线性分析以获得不相似的特征;(III)基于最小绝对值收缩和选择算子(LASSO)进行最佳代表性特征的最终选择。然后,选择的特征被输入 XGBoost 预测模型。实验结果表明,我们提出的模型优于几个现有的基线模型。此外,所提出的模型还提供了重要的特征,以增强 SiNCDs 预测模型的可解释性。因此,基于 XGBoost 的框架有望为公共卫生关注的 SiNCDs 的早期诊断和预防做出贡献。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验