文献检索，用中文搜 PubMed

-To identify effective data analytics and machine learning solutions that can help in the decision-making process in the medical domain and contribute to the understanding of COVID-19 disease. In this study, we analyze data from anonymized electronic medical records of 4711 patients with COVID-19 disease admitted to hospital in Atlanta. -We used random forest, LightGBM, XGBoost, CatBoost, KNN, SVM, logistic regression, and MLP neural network models in this work. The models are evaluated depending on the type of prediction by relevant metrics, especially accuracy, F1-score, and ROC AUC score. Another aim of the work was to find out which factors most affected severity and mortality risk among the patients. To identify the important features, different statistical methods were used, as well as LASSO regression, and explainable artificial intelligence (XAI) method SHAP values for model explainability. The best models were implemented in a web application and tested by medical experts. The model for prediction of mortality risk was tested on a validation cohort of 45 patients from the Department of Infectiology and Travel Medicine, L. Pasteur University Hospital in Košice (UNLP). -Our study shows that the best model for predicting COVID-19 disease severity was the LightGBM model with accuracy of 88.4% using all features and 89.5% using the eight most important features. The best model for predicting mortality risk was also the LightGBM model, which achieved a ROC AUC score of 83.7% and a classification accuracy of 81.2% using all features. Using a simplified model trained on the 15 most important features, the ROC AUC score was 83.6% and the classification accuracy was 80.5%. We deployed the simplified models for predicting COVID-19 disease severity and for predicting the risk of COVID-19-related death in a web-based application and tested them with medical experts. This test resulted in a ROC AUC score of 83.6% and an overall prediction accuracy of 73.3%.

-识别有效的数据分析和机器学习解决方案，以帮助医疗领域的决策过程，并促进对新冠病毒疾病的理解。在本研究中，我们分析了亚特兰大4711例因新冠病毒疾病住院患者的匿名电子病历数据。

-我们在这项工作中使用了随机森林、LightGBM、XGBoost、CatBoost、KNN、支持向量机（SVM）、逻辑回归和多层感知器（MLP）神经网络模型。这些模型根据预测类型通过相关指标进行评估，特别是准确率、F1分数和ROC曲线下面积（AUC）分数。这项工作的另一个目标是找出哪些因素对患者的病情严重程度和死亡风险影响最大。为了识别重要特征，我们使用了不同的统计方法，以及套索回归和用于模型可解释性的可解释人工智能（XAI）方法SHAP值。最佳模型在一个网络应用程序中实现，并由医学专家进行测试。死亡风险预测模型在来自科希策的L. 巴斯德大学医院传染病与旅行医学科的45名患者的验证队列上进行了测试。

-我们的研究表明，预测新冠病毒疾病严重程度的最佳模型是LightGBM模型，使用所有特征时准确率为88.4%，使用八个最重要特征时准确率为89.5%。预测死亡风险的最佳模型也是LightGBM模型，使用所有特征时ROC AUC分数为83.7%，分类准确率为81.2%。使用在15个最重要特征上训练的简化模型，ROC AUC分数为83.6%，分类准确率为80.5%。我们在一个基于网络的应用程序中部署了用于预测新冠病毒疾病严重程度和预测新冠病毒相关死亡风险的简化模型，并由医学专家进行测试。该测试的ROC AUC分数为83.6%，总体预测准确率为73.3%。