Inonu University, Faculty of Medicine, Department of Biostatistics and Medical Informatics, Malatya, Turkey.
Comput Methods Programs Biomed. 2021 Apr;202:105996. doi: 10.1016/j.cmpb.2021.105996. Epub 2021 Feb 15.
COVID-19 progresses slowly and negatively affects many people. However, mild to moderate symptoms develop in most infected people, who recover without hospitalization. Therefore, the development of early diagnosis and treatment strategies is essential. One of these methods is proteomic technology based on the blood protein profiling technique. This study aims to classify three COVID-19 positive patient groups (mild, severe, and critical) and a control group based on the blood protein profiling using deep learning (DL), random forest (RF), and gradient boosted trees (GBTs).
The dataset consists of 93 samples (60 COVID-19 patients, 33 control), and 370 variables obtained from an open-source website. The current dataset contains age, gender, and 368 protein, used to predict the relationship between disease severity and proteins using DL and machine learning approaches (RF, GBTs). An evolutionary algorithm tunes hyperparameters of the models and the predictions are assessed through accuracy, sensitivity, specificity, precision, F1 score, classification error, and kappa performance metrics.
The accuracy of RF (96.21%) was higher as compared to DL (94.73%). However, the ensemble classifier GBTs produced the highest accuracy (96.98%). TGB1BP2 in the cardiovascular II panel and MILR1 in the inflammation panel were the two most important proteins associated with disease severity.
The proposed model (GBTs) achieved the best prediction of disease severity based on the proteins compared to the other algorithms. The results point out that changes in blood proteins associated with the severity of COVID-19 may be used in monitoring and early diagnosis/treatment of the disease.
COVID-19 进展缓慢,对许多人产生负面影响。然而,大多数感染者会出现轻度至中度症状,并在无需住院的情况下康复。因此,开发早期诊断和治疗策略至关重要。其中一种方法是基于血液蛋白谱分析技术的蛋白质组学技术。本研究旨在使用深度学习(DL)、随机森林(RF)和梯度提升树(GBTs)根据血液蛋白谱分析对三组 COVID-19 阳性患者(轻症、重症和危重症)和对照组进行分类。
数据集由 93 个样本(60 个 COVID-19 患者,33 个对照)组成,从开源网站获得 370 个变量。当前数据集包含年龄、性别和 368 种蛋白质,用于使用 DL 和机器学习方法(RF、GBTs)预测疾病严重程度与蛋白质之间的关系。进化算法调整模型的超参数,通过准确性、敏感性、特异性、精度、F1 得分、分类误差和 Kappa 性能指标评估预测。
RF(96.21%)的准确性高于 DL(94.73%)。然而,集成分类器 GBTs 产生了最高的准确性(96.98%)。心血管 II 组中的 TGB1BP2 和炎症组中的 MILR1 是与疾病严重程度最相关的两种最重要的蛋白质。
与其他算法相比,所提出的模型(GBTs)在基于蛋白质预测疾病严重程度方面取得了最佳效果。结果表明,与 COVID-19 严重程度相关的血液蛋白质变化可用于监测和早期诊断/治疗该疾病。