School of Computing and Information Technology, Great Bay University, Guangdong, China.
Department of Critical Care Medicine, Shenzhen People's Hospital, the First Affiliated Hospital of Southern University of Science and Technology, the Second Clinical Medicine College of Jinan University, Shenzhen, China.
PLoS Comput Biol. 2024 Oct 21;20(10):e1012083. doi: 10.1371/journal.pcbi.1012083. eCollection 2024 Oct.
Sepsis is a life-threatening condition characterized by an exaggerated immune response to pathogens, leading to organ damage and high mortality rates in the intensive care unit. Although deep learning has achieved impressive performance on prediction and classification tasks in medicine, it requires large amounts of data and lacks explainability, which hinder its application to sepsis diagnosis. We introduce a deep learning framework, called scCaT, which blends the capsulating architecture with Transformer to develop a sepsis diagnostic model using single-cell RNA sequencing data and transfers it to bulk RNA data. The capsulating architecture effectively groups genes into capsules based on biological functions, which provides explainability in encoding gene expressions. The Transformer serves as a decoder to classify sepsis patients and controls. Our model achieves high accuracy with an AUROC of 0.93 on the single-cell test set and an average AUROC of 0.98 on seven bulk RNA cohorts. Additionally, the capsules can recognize different cell types and distinguish sepsis from control samples based on their biological pathways. This study presents a novel approach for learning gene modules and transferring the model to other data types, offering potential benefits in diagnosing rare diseases with limited subjects.
脓毒症是一种危及生命的病症,其特征是对病原体的免疫反应过度,导致重症监护病房的器官损伤和高死亡率。尽管深度学习在医学中的预测和分类任务上取得了令人印象深刻的性能,但它需要大量的数据并且缺乏可解释性,这阻碍了它在脓毒症诊断中的应用。我们引入了一个深度学习框架,称为 scCaT,它将封装架构与 Transformer 相结合,使用单细胞 RNA 测序数据开发脓毒症诊断模型,并将其转移到批量 RNA 数据上。封装架构可以根据生物学功能有效地将基因分组到胶囊中,从而在编码基因表达方面提供可解释性。Transformer 作为解码器对脓毒症患者和对照进行分类。我们的模型在单细胞测试集上实现了 0.93 的高准确率,在七个批量 RNA 队列上的平均准确率为 0.98。此外,胶囊可以根据生物学途径识别不同的细胞类型,并将脓毒症与对照样本区分开来。这项研究提出了一种学习基因模块并将模型转移到其他数据类型的新方法,为诊断罕见疾病提供了潜在的益处,因为这些疾病的研究对象有限。