COVID-19 数据集特征选择技术的比较分析。

Comparative analysis of feature selection techniques for COVID-19 dataset.

机构信息

Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

Hearing Sciences, Mental Health and Clinical Neurosciences, School of Medicine, National Institute for Health and Care Research (NIHR) Nottingham Biomedical Research Center, University of Nottingham, Nottingham, UK.

出版信息

Sci Rep. 2024 Aug 11;14(1):18627. doi: 10.1038/s41598-024-69209-6.

DOI:10.1038/s41598-024-69209-6

PMID:39128991

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11317481/

Abstract

In the context of early disease detection, machine learning (ML) has emerged as a vital tool. Feature selection (FS) algorithms play a crucial role in ensuring the accuracy of predictive models by identifying the most influential variables. This study, focusing on a retrospective cohort of 4778 COVID-19 patients from Iran, explores the performance of various FS methods, including filter, embedded, and hybrid approaches, in predicting mortality outcomes. The researchers leveraged 115 routine clinical, laboratory, and demographic features and employed 13 ML models to assess the effectiveness of these FS methods based on classification accuracy, predictive accuracy, and statistical tests. The results indicate that a Hybrid Boruta-VI model combined with the Random Forest algorithm demonstrated superior performance, achieving an accuracy of 0.89, an F1 score of 0.76, and an AUC value of 0.95 on test data. Key variables identified as important predictors of adverse outcomes include age, oxygen saturation levels, albumin levels, neutrophil counts, platelet levels, and markers of kidney function. These findings highlight the potential of advanced FS techniques and ML models in enhancing early disease detection and informing clinical decision-making.

摘要

在早期疾病检测方面，机器学习 (ML) 已成为一种重要工具。特征选择 (FS) 算法通过识别最具影响力的变量，在确保预测模型的准确性方面发挥着关键作用。本研究聚焦于来自伊朗的 4778 例 COVID-19 患者的回顾性队列，探讨了各种 FS 方法（包括过滤、嵌入式和混合方法）在预测死亡率结果方面的性能。研究人员利用了 115 项常规临床、实验室和人口统计学特征，并采用 13 种 ML 模型，根据分类准确性、预测准确性和统计检验来评估这些 FS 方法的有效性。结果表明，结合随机森林算法的 Hybrid Boruta-VI 模型表现最佳，在测试数据上的准确率为 0.89，F1 得分为 0.76，AUC 值为 0.95。被确定为不良预后重要预测因子的关键变量包括年龄、血氧饱和度水平、白蛋白水平、中性粒细胞计数、血小板计数以及肾功能标志物。这些发现强调了先进的 FS 技术和 ML 模型在增强早期疾病检测和为临床决策提供信息方面的潜力。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

COVID-19 数据集特征选择技术的比较分析。

Comparative analysis of feature selection techniques for COVID-19 dataset.

机构信息

出版信息

相似文献

本文引用的文献

COVID-19 数据集特征选择技术的比较分析。

Comparative analysis of feature selection techniques for COVID-19 dataset.

机构信息

出版信息

相似文献

本文引用的文献