Laatifi Mariam, Douzi Samira, Bouklouz Abdelaziz, Ezzine Hind, Jaafari Jaafar, Zaid Younes, El Ouahidi Bouabid, Naciri Mariam
Department of Biology, Faculty of Sciences, Mohammed V University, Rabat, Morocco.
FMPR, University Mohammed V, Rabat, Morocco.
J Big Data. 2022;9(1):5. doi: 10.1186/s40537-021-00557-0. Epub 2022 Jan 6.
The purpose of this study is to develop and test machine learning-based models for COVID-19 severity prediction. COVID-19 test samples from 337 COVID-19 positive patients at Cheikh Zaid Hospital were grouped according to the severity of their illness. Ours is the first study to estimate illness severity by combining biological and non-biological data from patients with COVID-19. Moreover the use of ML for therapeutic purposes in Morocco is currently restricted, and ours is the first study to investigate the severity of COVID-19. When data analysis approaches were used to uncover patterns and essential characteristics in the data, C-reactive protein, platelets, and D-dimers were determined to be the most associated to COVID-19 severity prediction. In this research, many data reduction algorithms were used, and Machine Learning models were trained to predict the severity of sickness using patient data. A new feature engineering method based on topological data analysis called Uniform Manifold Approximation and Projection (UMAP) shown that it achieves better results. It has 100% accuracy, specificity, sensitivity, and ROC curve in conducting a prognostic prediction using different machine learning classifiers such as X_GBoost, AdaBoost, Random Forest, and ExtraTrees. The proposed approach aims to assist hospitals and medical facilities in determining who should be seen first and who has a higher priority for admission to the hospital.
本研究的目的是开发并测试基于机器学习的新冠肺炎严重程度预测模型。从谢赫·扎伊德医院的337名新冠肺炎阳性患者的检测样本中,按照疾病严重程度进行了分组。我们的研究是首个通过结合新冠肺炎患者的生物和非生物数据来评估疾病严重程度的研究。此外,在摩洛哥,将机器学习用于治疗目的目前受到限制,而我们的研究是首个调查新冠肺炎严重程度的研究。当使用数据分析方法来揭示数据中的模式和基本特征时,发现C反应蛋白、血小板和D - 二聚体与新冠肺炎严重程度预测的相关性最强。在本研究中,使用了多种数据约简算法,并基于患者数据训练机器学习模型来预测疾病严重程度。一种基于拓扑数据分析的名为均匀流形逼近与投影(UMAP)的新特征工程方法显示取得了更好的结果。在使用不同的机器学习分类器(如XGBoost、AdaBoost、随机森林和极端随机树)进行预后预测时,它具有100%的准确率、特异性、敏感性和ROC曲线。所提出的方法旨在帮助医院和医疗机构确定谁应优先就诊以及谁具有更高的住院优先级。