Suppr超能文献

基于级联和集成学习算法的妊娠期糖尿病预测。

Prediction of Gestational Diabetes Mellitus under Cascade and Ensemble Learning Algorithm.

机构信息

Department of Obstetrics, Xianyang Central Hospital, Xianyang City 712000, China.

Department of Hematology Endocrinology, Xianyang Hospital of Yan'an University, Xianyang City 712000, China.

出版信息

Comput Intell Neurosci. 2022 Jul 14;2022:3212738. doi: 10.1155/2022/3212738. eCollection 2022.

Abstract

Gestational diabetes mellitus (GDM) is one of the risk factors for fetal dysplasia and maternal pregnancy difficulties. Therefore, the prediction of the risk of GDM in advance has become a big demand for millions of families. Therefore, machine learning technology is adopted to study GDM prediction. Firstly, the data is preprocessed, and the mean value is used for outlier processing. After preprocessing of the data, the IV value method is used to screen the features. Of the 83 features in the original sample data, 40 important features are screened out through feature engineering. On this basis, Logistics regression model, Lasso-Logistics, Gradient Boosting Decision Tree (GBDT), Extreme Gradient Boosting (Xgboost), Light Gradient Boosting Machine (Lightgbm), and Gradient Boosting Categorical Features (Catboost) are established, and multiple learners are integrated. Finally, the constructed model is tested on data sets. The accuracy of the proposed model is 80.3%, the accuracy is 74.6%, the recall rate is 79.3%, and the running time is only 2.53 seconds. This means that the proposed model is superior to the previous models in terms of accuracy, precision, recall rate, and F1 value, and the time consumption is also in line with the actual engineering requirements. The proposed scheme provides some ideas for the research of machine learning technology in disease prediction.

摘要

妊娠期糖尿病(GDM)是胎儿畸形和产妇妊娠困难的危险因素之一。因此,提前预测 GDM 的风险已成为数百万家庭的一大需求。因此,采用机器学习技术研究 GDM 预测。首先,对数据进行预处理,使用平均值进行异常值处理。在对数据进行预处理后,使用 IV 值方法筛选特征。在原始样本数据的 83 个特征中,通过特征工程筛选出 40 个重要特征。在此基础上,建立了 Logistics 回归模型、Lasso-Logistics、梯度提升决策树(GBDT)、极端梯度提升(Xgboost)、Light Gradient Boosting Machine(Lightgbm)和梯度提升分类特征(Catboost),并进行了多学习者集成。最后,在数据集上测试构建的模型。所提出模型的准确率为 80.3%,精度为 74.6%,召回率为 79.3%,运行时间仅为 2.53 秒。这意味着所提出的模型在准确率、精度、召回率和 F1 值方面均优于以前的模型,并且消耗的时间也符合实际工程要求。所提出的方案为疾病预测中机器学习技术的研究提供了一些思路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afe0/9303101/97decd3eb3e4/CIN2022-3212738.001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验