Ren Jing-Xin, Gao Qian, Zhou Xiao-Chao, Chen Lei, Guo Wei, Feng Kai-Yan, Lu Lin, Huang Tao, Cai Yu-Dong
School of Life Sciences, Shanghai University, Shanghai 200444, China.
Department of Pharmacy, Shanghai Children's Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China.
Biology (Basel). 2023 Jul 2;12(7):947. doi: 10.3390/biology12070947.
As COVID-19 develops, dynamic changes occur in the patient's immune system. Changes in molecular levels in different immune cells can reflect the course of COVID-19. This study aims to uncover the molecular characteristics of different immune cell subpopulations at different stages of COVID-19. We designed a machine learning workflow to analyze scRNA-seq data of three immune cell types (B, T, and myeloid cells) in four levels of COVID-19 severity/outcome. The datasets for three cell types included 403,700 B-cell, 634,595 T-cell, and 346,547 myeloid cell samples. Each cell subtype was divided into four groups, control, convalescence, progression mild/moderate, and progression severe/critical, and each immune cell contained 27,943 gene features. A feature analysis procedure was applied to the data of each cell type. Irrelevant features were first excluded according to their relevance to the target variable measured by mutual information. Then, four ranking algorithms (last absolute shrinkage and selection operator, light gradient boosting machine, Monte Carlo feature selection, and max-relevance and min-redundancy) were adopted to analyze the remaining features, resulting in four feature lists. These lists were fed into the incremental feature selection, incorporating three classification algorithms (decision tree, k-nearest neighbor, and random forest) to extract key gene features and construct classifiers with superior performance. The results confirmed that genes such as PFN1, RPS26, and FTH1 played important roles in SARS-CoV-2 infection. These findings provide a useful reference for the understanding of the ongoing effect of COVID-19 development on the immune system.
随着新冠病毒病(COVID-19)的发展,患者的免疫系统会发生动态变化。不同免疫细胞中分子水平的变化可以反映COVID-19的病程。本研究旨在揭示COVID-19不同阶段不同免疫细胞亚群的分子特征。我们设计了一个机器学习工作流程,以分析COVID-19严重程度/结果四个水平下三种免疫细胞类型(B细胞、T细胞和髓样细胞)的单细胞RNA测序(scRNA-seq)数据。三种细胞类型的数据集包括403,700个B细胞、634,595个T细胞和346,547个髓样细胞样本。每个细胞亚型分为四组:对照组、恢复期、轻度/中度进展期和重度/危重症进展期,每个免疫细胞包含27,943个基因特征。对每种细胞类型的数据应用特征分析程序。首先根据与通过互信息测量的目标变量的相关性排除无关特征。然后,采用四种排序算法(最小绝对收缩和选择算子、轻梯度提升机、蒙特卡罗特征选择以及最大相关性和最小冗余)分析剩余特征,得到四个特征列表。这些列表被输入到增量特征选择中,结合三种分类算法(决策树、k近邻和随机森林)以提取关键基因特征并构建性能优越的分类器。结果证实,诸如丝切蛋白1(PFN1)、核糖体蛋白S26(RPS26)和铁蛋白重链1(FTH1)等基因在严重急性呼吸综合征冠状病毒2(SARS-CoV-2)感染中起重要作用。这些发现为理解COVID-19发展对免疫系统的持续影响提供了有用的参考。