ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy.
Dipartimento di Ingegneria, Università degli studi di Palermo, Viale Delle Scienze, ed. 6, 90128, Palermo, Italy.
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad332.
Single-cell RNA-sequencing (scRNA-seq) allows for obtaining genomic and transcriptomic profiles of individual cells. That data make it possible to characterize tissues at the cell level. In this context, one of the main analyses exploiting scRNA-seq data is identifying the cell types within tissue to estimate the quantitative composition of cell populations. Due to the massive amount of available scRNA-seq data, automatic classification approaches for cell typing, based on the most recent deep learning technology, are needed. Here, we present the gene ontology-driven wide and deep learning (GOWDL) model for classifying cell types in several tissues. GOWDL implements a hybrid architecture that considers the functional annotations found in Gene Ontology and the marker genes typical of specific cell types. We performed cross-validation and independent external testing, comparing our algorithm with 12 other state-of-the-art predictors. Classification scores demonstrated that GOWDL reached the best results over five different tissues, except for recall, where we got about 92% versus 97% of the best tool. Finally, we presented a case study on classifying immune cell populations in breast cancer using a hierarchical approach based on GOWDL.
单细胞 RNA 测序 (scRNA-seq) 可用于获取单个细胞的基因组和转录组谱。这些数据使我们能够在细胞水平上描述组织。在这种情况下,利用 scRNA-seq 数据进行的主要分析之一是识别组织中的细胞类型,以估计细胞群体的定量组成。由于可用的 scRNA-seq 数据量巨大,因此需要基于最新的深度学习技术的自动细胞分型分类方法。在这里,我们提出了用于在多种组织中分类细胞类型的基于基因本体驱动的广泛和深度学习 (GOWDL) 模型。GOWDL 实现了一种混合架构,该架构考虑了基因本体中发现的功能注释和特定细胞类型的典型标记基因。我们进行了交叉验证和独立的外部测试,将我们的算法与其他 12 种最先进的预测器进行了比较。分类评分表明,GOWDL 在五种不同的组织中达到了最佳结果,除了召回率,我们的召回率约为 92%,而最佳工具的召回率为 97%。最后,我们提出了一个基于 GOWDL 的层次方法在乳腺癌中分类免疫细胞群体的案例研究。