Hu Huan, Feng Zhen, Shuai Xinghao Steven, Lyu Jie, Li Xiang, Lin Hai, Shuai Jianwei
Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China.
Wenzhou Institute and Wenzhou Key Laboratory of Biophysics, University of Chinese Academy of Sciences, Wenzhou, China.
Front Microbiol. 2023 Jul 10;14:1236653. doi: 10.3389/fmicb.2023.1236653. eCollection 2023.
Single-cell RNA sequencing (scRNA-seq) is a powerful tool for understanding cellular heterogeneity and identifying cell types in virus-related research. However, direct identification of SARS-CoV-2-infected cells at the single-cell level remains challenging, hindering the understanding of viral pathogenesis and the development of effective treatments.
In this study, we propose a deep learning framework, the single-cell virus detection network (scVDN), to predict the infection status of single cells. The scVDN is trained on scRNA-seq data from multiple nasal swab samples obtained from several contributors with varying cell types. To objectively evaluate scVDN's performance, we establish a model evaluation framework suitable for real experimental data.
Our results demonstrate that scVDN outperforms four state-of-the-art machine learning models in identifying SARS-CoV-2-infected cells, even with extremely imbalanced labels in real data. Specifically, scVDN achieves a perfect AUC score of 1 in four cell types. Our findings have important implications for advancing virus research and improving public health by enabling the identification of virus-infected cells at the single-cell level, which is critical for diagnosing and treating viral infections. The scVDN framework can be applied to other single-cell virus-related studies, and we make all source code and datasets publicly available on GitHub at https://github.com/studentiz/scvdn.
单细胞RNA测序(scRNA-seq)是在病毒相关研究中理解细胞异质性和识别细胞类型的强大工具。然而,在单细胞水平直接鉴定严重急性呼吸综合征冠状病毒2(SARS-CoV-2)感染的细胞仍然具有挑战性,这阻碍了对病毒发病机制的理解和有效治疗方法的开发。
在本研究中,我们提出了一种深度学习框架,即单细胞病毒检测网络(scVDN),以预测单细胞的感染状态。scVDN在来自多个贡献者的具有不同细胞类型的多个鼻拭子样本的scRNA-seq数据上进行训练。为了客观评估scVDN的性能,我们建立了一个适用于真实实验数据的模型评估框架。
我们的结果表明,即使在真实数据中标签极度不平衡的情况下,scVDN在识别SARS-CoV-2感染的细胞方面也优于四种先进的机器学习模型。具体而言,scVDN在四种细胞类型中实现了完美的AUC分数1。我们的发现对于推进病毒研究和改善公共卫生具有重要意义,通过在单细胞水平识别病毒感染的细胞,这对于诊断和治疗病毒感染至关重要。scVDN框架可应用于其他单细胞病毒相关研究,我们将所有源代码和数据集在GitHub上公开,网址为https://github.com/studentiz/scvdn。