College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China.
Department of Artificial Intelligence, Faculty of Computer Science, Campus de Montegancedo, Polytechnical University of Madrid, Boadilla del Monte, 28660 Madrid, Spain.
Biomolecules. 2023 Mar 28;13(4):611. doi: 10.3390/biom13040611.
Single-cell transcriptomics is rapidly advancing our understanding of the composition of complex tissues and biological cells, and single-cell RNA sequencing (scRNA-seq) holds great potential for identifying and characterizing the cell composition of complex tissues. Cell type identification by analyzing scRNA-seq data is mostly limited by time-consuming and irreproducible manual annotation. As scRNA-seq technology scales to thousands of cells per experiment, the exponential increase in the number of cell samples makes manual annotation more difficult. On the other hand, the sparsity of gene transcriptome data remains a major challenge. This paper applied the idea of the transformer to single-cell classification tasks based on scRNA-seq data. We propose scTransSort, a cell-type annotation method pretrained with single-cell transcriptomics data. The scTransSort incorporates a method of representing genes as gene expression embedding blocks to reduce the sparsity of data used for cell type identification and reduce the computational complexity. The feature of scTransSort is that its implementation of intelligent information extraction for unordered data, automatically extracting valid features of cell types without the need for manually labeled features and additional references. In experiments on cells from 35 human and 26 mouse tissues, scTransSort successfully elucidated its high accuracy and high performance for cell type identification, and demonstrated its own high robustness and generalization ability.
单细胞转录组学正在迅速提高我们对复杂组织和生物细胞组成的认识,单细胞 RNA 测序(scRNA-seq)在识别和描述复杂组织的细胞组成方面具有巨大的潜力。通过分析 scRNA-seq 数据进行细胞类型鉴定,主要受到耗时且不可重复的手动注释的限制。随着 scRNA-seq 技术扩展到每个实验数千个细胞,细胞样本数量的指数级增长使得手动注释更加困难。另一方面,基因转录组数据的稀疏性仍然是一个主要挑战。本文基于 scRNA-seq 数据将变压器的思想应用于单细胞分类任务。我们提出了 scTransSort,这是一种基于单细胞转录组学数据预训练的细胞类型注释方法。scTransSort 采用了将基因表示为基因表达嵌入块的方法,以减少用于细胞类型识别的数据稀疏性,并降低计算复杂度。scTransSort 的特点是它对无序数据的智能信息提取的实现,无需手动标记特征和额外的参考,即可自动提取细胞类型的有效特征。在来自 35 个人类和 26 个小鼠组织的细胞的实验中,scTransSort 成功地证明了其在细胞类型识别方面的高精度和高性能,并展示了其自身的高鲁棒性和泛化能力。