Laboratory for Medical Science Mathematics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Japan.
Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Japan.
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad266.
Annotation of cell-types is a critical step in the analysis of single-cell RNA sequencing (scRNA-seq) data that allows the study of heterogeneity across multiple cell populations. Currently, this is most commonly done using unsupervised clustering algorithms, which project single-cell expression data into a lower dimensional space and then cluster cells based on their distances from each other. However, as these methods do not use reference datasets, they can only achieve a rough classification of cell-types, and it is difficult to improve the recognition accuracy further. To effectively solve this issue, we propose a novel supervised annotation method, scDeepInsight. The scDeepInsight method is capable of performing manifold assignments. It is competent in executing data integration through batch normalization, performing supervised training on the reference dataset, doing outlier detection and annotating cell-types on query datasets. Moreover, it can help identify active genes or marker genes related to cell-types. The training of the scDeepInsight model is performed in a unique way. Tabular scRNA-seq data are first converted to corresponding images through the DeepInsight methodology. DeepInsight can create a trainable image transformer to convert non-image RNA data to images by comprehensively comparing interrelationships among multiple genes. Subsequently, the converted images are fed into convolutional neural networks such as EfficientNet-b3. This enables automatic feature extraction to identify the cell-types of scRNA-seq samples. We benchmarked scDeepInsight with six other mainstream cell annotation methods. The average accuracy rate of scDeepInsight reached 87.5%, which is more than 7% higher compared with the state-of-the-art methods.
细胞类型注释是单细胞 RNA 测序 (scRNA-seq) 数据分析的关键步骤,它允许研究多个细胞群体之间的异质性。目前,这通常是通过使用无监督聚类算法来完成的,该算法将单细胞表达数据投影到较低维空间,然后根据细胞之间的距离对细胞进行聚类。然而,由于这些方法不使用参考数据集,它们只能对细胞类型进行大致分类,并且很难进一步提高识别准确性。为了有效解决这个问题,我们提出了一种新的有监督注释方法 scDeepInsight。scDeepInsight 方法能够执行流形分配。它能够通过批量归一化执行数据集成,在参考数据集上进行有监督训练,对查询数据集进行异常值检测和细胞类型注释。此外,它还可以帮助识别与细胞类型相关的活跃基因或标记基因。scDeepInsight 模型的训练采用独特的方式。首先通过 DeepInsight 方法将表格 scRNA-seq 数据转换为相应的图像。DeepInsight 可以创建一个可训练的图像转换器,通过综合比较多个基因之间的相互关系,将非图像 RNA 数据转换为图像。然后,将转换后的图像输入卷积神经网络,如 EfficientNet-b3。这使得自动特征提取能够识别 scRNA-seq 样本的细胞类型。我们用另外六种主流细胞注释方法对 scDeepInsight 进行了基准测试。scDeepInsight 的平均准确率达到 87.5%,比最先进的方法高出 7%以上。