Bai Kailun, Moa Belaid, Shao Xiaojian, Zhang Xuekui
Department of Mathematics and Statistics, University of Victoria, Victoria, BC V8P 5C2, Canada.
Digital Research Alliance of Canada, Victoria, BC V8P 5C2, Canada.
Brief Bioinform. 2025 Aug 31;26(5). doi: 10.1093/bib/bbaf446.
The emergence of single-cell RNA sequencing (scRNA-seq) technology has transformed our understanding of cellular diversity, yet it presents notable challenges for cell type annotation due to data's high dimensionality and sparsity. To tackle these issues, we present scSorterDL, an innovative approach that combines penalized Linear Discriminant Analysis (pLDA), swarm learning, and deep neural networks (DNNs) to improve cell type classification. In scSorterDL, we generate numerous random subsets of the data and apply pLDA models to each subset to capture varied data aspects. The model outputs are then consolidated using a DNN that identifies complex relationships among the pLDA scores, enhancing classification accuracy by considering interactions that simpler methods might overlook. Utilizing GPU computing for both swarm learning and deep learning, scSorterDL adeptly manages large datasets and high-dimensional gene expression data. We tested scSorterDL on 13 real scRNA-seq datasets from diverse species, tissues, and platforms, as well as on 20 pairs of cross-platform datasets. Our method surpassed nine current cell annotation tools in both accuracy and robustness, indicating exceptional performance in both cross-validation and cross-platform contexts. These findings underscore the potential of scSorterDL as an effective and adaptable tool for automated cell type annotation in scRNA-seq research. The code is available on GitHub: https://github.com/kellen8hao/scSorterDL.
单细胞RNA测序(scRNA-seq)技术的出现改变了我们对细胞多样性的理解,但由于数据的高维度和稀疏性,它在细胞类型注释方面面临着显著挑战。为了解决这些问题,我们提出了scSorterDL,这是一种创新方法,它结合了惩罚线性判别分析(pLDA)、群体学习和深度神经网络(DNN)来改进细胞类型分类。在scSorterDL中,我们生成大量数据的随机子集,并将pLDA模型应用于每个子集以捕捉不同的数据方面。然后使用DNN整合模型输出,该DNN识别pLDA分数之间的复杂关系,通过考虑更简单方法可能忽略的相互作用来提高分类准确性。利用GPU计算进行群体学习和深度学习,scSorterDL能够熟练处理大型数据集和高维基因表达数据。我们在来自不同物种、组织和平台的13个真实scRNA-seq数据集以及20对跨平台数据集上测试了scSorterDL。我们的方法在准确性和稳健性方面都超过了目前的九种细胞注释工具,表明在交叉验证和跨平台环境中都具有出色的性能。这些发现强调了scSorterDL作为scRNA-seq研究中自动细胞类型注释的有效且适应性强的工具的潜力。代码可在GitHub上获取:https://github.com/kellen8hao/scSorterDL。