Liu Qingchun, Xu Yan
School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China.
School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China.
J Mol Biol. 2025 Feb 15;437(4):168936. doi: 10.1016/j.jmb.2025.168936. Epub 2025 Jan 9.
Single-cell RNA sequencing (scRNA-seq) analysis offers tremendous potential for addressing various biological questions, with one key application being the annotation of query datasets with unknown cell types using well-annotated external reference datasets. However, the performance of existing supervised or semi-supervised methods largely depends on the quality of source data. Furthermore, these methods often struggle with the batch effects arising from different platforms when handling multiple reference or query datasets, making precise annotation challenging. We developed transCAE, a robust transfer learning-based algorithm for single-cell annotation that integrates unsupervised dimensionality reduction with supervised cell type classification. This approach fully leverages information from both reference and query datasets to achieve precise cell classification within the query data. Extensive evaluations show that transCAE significantly enhances classification accuracy and efficiently mitigates batch effects. Compared to other state-of-the-art methods, transCAE demonstrates superior performance in experiments involving multiple reference or query datasets. These strengths position transCAE as an optimal annotation method for scRNA-seq datasets.
单细胞RNA测序(scRNA-seq)分析为解决各种生物学问题提供了巨大潜力,其一个关键应用是使用注释良好的外部参考数据集对细胞类型未知的查询数据集进行注释。然而,现有监督或半监督方法的性能在很大程度上取决于源数据的质量。此外,这些方法在处理多个参考或查询数据集时,常常难以应对不同平台产生的批次效应,使得精确注释具有挑战性。我们开发了transCAE,这是一种基于稳健迁移学习的单细胞注释算法,它将无监督降维与监督细胞类型分类相结合。这种方法充分利用参考数据集和查询数据集中的信息,以在查询数据中实现精确的细胞分类。广泛的评估表明,transCAE显著提高了分类准确性,并有效减轻了批次效应。与其他最先进的方法相比,transCAE在涉及多个参考或查询数据集的实验中表现出卓越的性能。这些优势使transCAE成为scRNA-seq数据集的最佳注释方法。