Suppr超能文献

个体因素与 COVID-19 感染的相关性:一项机器学习研究。

Individual Factors Associated With COVID-19 Infection: A Machine Learning Study.

机构信息

Cátedras Conacyt, National Council on Science and Technology, Mexico City, Mexico.

Center for Research in Geospatial Information Sciences, Mexico City, Mexico.

出版信息

Front Public Health. 2022 Jun 30;10:912099. doi: 10.3389/fpubh.2022.912099. eCollection 2022.

Abstract

The fast, exponential increase of COVID-19 infections and their catastrophic effects on patients' health have required the development of tools that support health systems in the quick and efficient diagnosis and prognosis of this disease. In this context, the present study aims to identify the potential factors associated with COVID-19 infections, applying machine learning techniques, particularly random forest, chi-squared, xgboost, and rpart for feature selection; ROSE and SMOTE were used as resampling methods due to the existence of class imbalance. Similarly, machine and deep learning algorithms such as support vector machines, C4.5, random forest, rpart, and deep neural networks were explored during the train/test phase to select the best prediction model. The dataset used in this study contains clinical data, anthropometric measurements, and other health parameters related to smoking habits, alcohol consumption, quality of sleep, physical activity, and health status during confinement due to the pandemic associated with COVID-19. The results showed that the XGBoost model got the best features associated with COVID-19 infection, and random forest approximated the best predictive model with a balanced accuracy of 90.41% using SMOTE as a resampling technique. The model with the best performance provides a tool to help prevent contracting SARS-CoV-2 since the variables with the highest risk factor are detected, and some of them are, to a certain extent controllable.

摘要

COVID-19 感染的快速、指数级增长及其对患者健康的灾难性影响,要求开发工具来支持卫生系统快速、有效地诊断和预测这种疾病。在这种情况下,本研究旨在应用机器学习技术,特别是随机森林、卡方检验、xgboost 和 rpart 进行特征选择,识别与 COVID-19 感染相关的潜在因素;由于存在类别不平衡,使用 ROSE 和 SMOTE 作为重采样方法。同样,在训练/测试阶段还探索了机器和深度学习算法,如支持向量机、C4.5、随机森林、rpart 和深度神经网络,以选择最佳预测模型。本研究使用的数据集包含与 COVID-19 相关的临床数据、人体测量学测量值以及与吸烟习惯、饮酒、睡眠质量、身体活动和大流行期间禁闭健康状况有关的其他健康参数。结果表明,XGBoost 模型获得了与 COVID-19 感染相关的最佳特征,随机森林使用 SMOTE 作为重采样技术,以 90.41%的平衡准确率逼近最佳预测模型。表现最佳的模型提供了一种帮助预防感染 SARS-CoV-2 的工具,因为可以检测到具有最高风险因素的变量,其中一些在一定程度上是可以控制的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/9279686/c8e457fdbe2c/fpubh-10-912099-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验