Qin Rui, Mahal Lara K, Bojar Daniel
Department of Chemistry, University of Alberta, Edmonton, AB T6G 2G2, Canada.
Department of Chemistry and Molecular Biology, University of Gothenburg, 405 30 Gothenburg, Sweden.
iScience. 2022 Sep 19;25(10):105163. doi: 10.1016/j.isci.2022.105163. eCollection 2022 Oct 21.
Glycosylation is ubiquitous and often dysregulated in disease. However, the regulation and functional significance of various types of glycosylation at cellular levels is hard to unravel experimentally. Multi-omics, single-cell measurements such as SUGAR-seq, which quantifies transcriptomes and cell surface glycans, facilitate addressing this issue. Using SUGAR-seq data, we pioneered a deep learning model to predict the glycan phenotypes of cells (mouse T lymphocytes) from transcripts, with the example of predicting β1,6GlcNAc-branching across T cell subtypes (test set F1 score: 0.9351). Model interpretation via SHAP (SHapley Additive exPlanations) identified highly predictive genes, in part known to impact (i) branched glycan levels and (ii) the biology of branched glycans. These genes included physiologically relevant low-abundance genes that were not captured by conventional differential expression analysis. Our work shows that interpretable deep learning models are promising for uncovering novel functions and regulatory mechanisms of glycans from integrated transcriptomic and glycomic datasets.
糖基化普遍存在,且在疾病中常常失调。然而,在细胞水平上,各种类型糖基化的调控及其功能意义很难通过实验来阐明。多组学、单细胞测量方法,如SUGAR-seq(可对转录组和细胞表面聚糖进行定量),有助于解决这一问题。利用SUGAR-seq数据,我们率先开发了一种深度学习模型,以从转录本预测细胞(小鼠T淋巴细胞)的聚糖表型,例如预测跨T细胞亚型的β1,6GlcNAc分支(测试集F1分数:0.9351)。通过SHAP(Shapley加性解释)进行的模型解释确定了具有高度预测性的基因,其中部分基因已知会影响(i)分支聚糖水平和(ii)分支聚糖的生物学特性。这些基因包括传统差异表达分析未捕获的生理相关低丰度基因。我们的工作表明,可解释的深度学习模型有望从整合的转录组和糖组数据集中揭示聚糖的新功能和调控机制。