Suppr超能文献

使用“护士健康研究”和“卫生专业人员随访研究”数据集,基于表型和性别的方法,用PyCaret预测2型糖尿病

PyCaret for Predicting Type 2 Diabetes: A Phenotype- and Gender-Based Approach with the "Nurses' Health Study" and the "Health Professionals' Follow-Up Study" Datasets.

作者信息

Gul Sebnem, Ayturan Kubilay, Hardalaç Fırat

机构信息

Department of Electrical and Electronics Engineering, Faculty of Engineering, Graduate School of Natural and Applied Sciences, Gazi University, Ankara 06570, Turkey.

出版信息

J Pers Med. 2024 Jul 29;14(8):804. doi: 10.3390/jpm14080804.

Abstract

Predicting type 2 diabetes mellitus (T2DM) by using phenotypic data with machine learning (ML) techniques has received significant attention in recent years. PyCaret, a low-code automated ML tool that enables the simultaneous application of 16 different algorithms, was used to predict T2DM by using phenotypic variables from the "Nurses' Health Study" and "Health Professionals' Follow-up Study" datasets. Ridge Classifier, Linear Discriminant Analysis, and Logistic Regression (LR) were the best-performing models for the male-only data subset. For the female-only data subset, LR, Gradient Boosting Classifier, and CatBoost Classifier were the strongest models. The AUC, accuracy, and precision were approximately 0.77, 0.70, and 0.70 for males and 0.79, 0.70, and 0.71 for females, respectively. The feature importance plot showed that family history of diabetes (famdb), never having smoked, and high blood pressure (hbp) were the most influential features in females, while famdb, hbp, and currently being a smoker were the major variables in males. In conclusion, PyCaret was used successfully for the prediction of T2DM by simplifying complex ML tasks. Gender differences are important to consider for T2DM prediction. Despite this comprehensive ML tool, phenotypic variables alone may not be sufficient for early T2DM prediction; genotypic variables could also be used in combination for future studies.

摘要

近年来,利用机器学习(ML)技术通过表型数据预测2型糖尿病(T2DM)受到了广泛关注。PyCaret是一种低代码自动化ML工具,能够同时应用16种不同算法,它被用于通过使用“护士健康研究”和“卫生专业人员随访研究”数据集中的表型变量来预测T2DM。岭分类器、线性判别分析和逻辑回归(LR)是仅针对男性数据子集表现最佳的模型。对于仅女性数据子集,LR、梯度提升分类器和CatBoost分类器是最强的模型。男性的AUC、准确率和精确率分别约为0.77、0.70和0.70,女性分别为0.79、0.70和0.71。特征重要性图显示,糖尿病家族史(famdb)、从不吸烟和高血压(hbp)是女性中最具影响力的特征,而famdb、hbp和当前吸烟者是男性中的主要变量。总之,PyCaret通过简化复杂的ML任务成功用于T2DM的预测。对于T2DM预测,性别差异很重要。尽管有这种全面的ML工具,但仅靠表型变量可能不足以进行早期T2DM预测;未来研究中也可结合使用基因型变量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476a/11355927/1103aada809c/jpm-14-00804-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验