PCLDA：一种基于简单统计方法的用于单细胞RNA测序数据的可解释细胞注释工具。

PCLDA: An interpretable cell annotation tool for single-cell RNA-sequencing data based on simple statistical methods.

作者信息

Bai Kailun, Moa Belaid, Shao Xiaojian, Zhang Xuekui

机构信息

Department of Mathematics and Statistics, University of Victoria, Victoria BC, Canada.

Digital Research Alliance of Canada, Victoria BC, Canada.

出版信息

Comput Struct Biotechnol J. 2025 Jul 23;27:3264-3274. doi: 10.1016/j.csbj.2025.07.019. eCollection 2025.

DOI:10.1016/j.csbj.2025.07.019

PMID:40778314

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12329077/

Abstract

Single-cell RNA sequencing (scRNA-seq) enables high-resolution analysis of cellular heterogeneity, yet accurate and consistent cell-type annotation remains a crucial challenge. Numerous automated tools exist, but their complex modeling assumptions can hinder reliability across varied datasets and protocols. We propose PCLDA, a pipeline composed of three modules: t-test-based gene screening, principal component analysis (PCA) and linear discriminant analysis (LDA), all built on simple statistical methods. An ablation study shows that each module in PCLDA contributes significantly to performance and robustness, with two novel enhancements in the second module yielding substantial gains. Despite these additions, the model retains its original assumptions, computational efficiency, and interpretability. Benchmarking against nine state-of-the-art methods across 22 public scRNA-seq datasets and 35 distinct evaluation scenarios, PCLDA consistently achieves top-tier accuracy under both intra-dataset (cross-validation) and inter-dataset (cross-platform) conditions. Notably, when reference and query data are generated via different protocols, PCLDA remains stable and often outperforms more complex machine-learning approaches. Furthermore, PCLDA offers strong interpretability, attributed to the linear nature of its PCA and LDA modules. The final decision boundaries are linear combinations of the original gene expression values, directly reflecting the contribution of each gene to the classification. Top-weighted genes identified by PCLDA better capture biologically meaningful signals in enrichment analyses than those selected via marginal screening alone, offering deeper functional insights into cell-type specificity. In conclusion, our work underscores the utility of carefully enhanced simple statistics methods for single-cell annotation. PCLDA's simplicity, interpretability, and consistently high performance make it a practical, reliable alternative to more complex annotation pipelines. Code is available on GitHub:https://github.com/kellen8hao/PCLDA.

摘要

单细胞RNA测序（scRNA-seq）能够对细胞异质性进行高分辨率分析，但准确且一致的细胞类型注释仍然是一项关键挑战。虽然存在许多自动化工具，但其复杂的建模假设可能会妨碍在不同数据集和实验方案中的可靠性。我们提出了PCLDA，这是一个由三个模块组成的流程：基于t检验的基因筛选、主成分分析（PCA）和线性判别分析（LDA），所有这些都建立在简单的统计方法之上。一项消融研究表明，PCLDA中的每个模块对性能和稳健性都有显著贡献，第二个模块中的两项新颖改进带来了显著提升。尽管有这些改进，该模型仍保留其原始假设、计算效率和可解释性。在22个公共scRNA-seq数据集和35种不同评估场景下与九种先进方法进行基准测试时，PCLDA在数据集内（交叉验证）和数据集间（跨平台）条件下均始终实现顶级准确性。值得注意的是，当参考数据和查询数据通过不同实验方案生成时，PCLDA保持稳定，并且通常优于更复杂的机器学习方法。此外，由于其PCA和LDA模块的线性性质，PCLDA具有很强的可解释性。最终的决策边界是原始基因表达值的线性组合，直接反映了每个基因对分类的贡献。与仅通过边际筛选选择的基因相比，PCLDA识别出的权重最高的基因在富集分析中能更好地捕捉生物学上有意义的信号，从而为细胞类型特异性提供更深入的功能见解。总之，我们的工作强调了精心改进的简单统计方法在单细胞注释中的实用性。PCLDA的简单性、可解释性和始终如一的高性能使其成为更复杂注释流程的实用、可靠替代方案。代码可在GitHub上获取：https://github.com/kellen8hao/PCLDA。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4a0/12329077/f35870ca791e/gr001.jpg

相似文献

PCLDA: An interpretable cell annotation tool for single-cell RNA-sequencing data based on simple statistical methods.

Comput Struct Biotechnol J. 2025 Jul 23;27:3264-3274. doi: 10.1016/j.csbj.2025.07.019. eCollection 2025.

Sexual Harassment and Prevention Training

Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.

Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

Leveraging multiple labeled datasets for the automated annotation of single-cell RNA and ATAC data.

Comput Struct Biotechnol J. 2025 Jul 1;27:2863-2870. doi: 10.1016/j.csbj.2025.06.043. eCollection 2025.

Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.

Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

Lightweight cross-resolution coarse-to-fine network for efficient deformable medical image registration.

Med Phys. 2025 Apr 25. doi: 10.1002/mp.17827.

A Responsible Framework for Assessing, Selecting, and Explaining Machine Learning Models in Cardiovascular Disease Outcomes Among People With Type 2 Diabetes: Methodology and Validation Study.

JMIR Med Inform. 2025 Jun 27;13:e66200. doi: 10.2196/66200.

scIMGCN: an Automatic Single-Cell Type Annotation Method Based on Interpretable Graph Convolutional Network.

Interdiscip Sci. 2025 Jul 19. doi: 10.1007/s12539-025-00738-y.

Automated devices for identifying peripheral arterial disease in people with leg ulceration: an evidence synthesis and cost-effectiveness analysis.

Health Technol Assess. 2024 Aug;28(37):1-158. doi: 10.3310/TWCG3912.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

本文引用的文献

Single-cell RNA sequencing technologies and applications: A brief overview.

Clin Transl Med. 2022 Mar;12(3):e694. doi: 10.1002/ctm2.694.

Differentially expressed genes reflect disease-induced rather than disease-causing changes in the transcriptome.

Nat Commun. 2021 Sep 24;12(1):5647. doi: 10.1038/s41467-021-25805-y.

Integrated analysis of multimodal single-cell data.

Cell. 2021 Jun 24;184(13):3573-3587.e29. doi: 10.1016/j.cell.2021.04.048. Epub 2021 May 31.

Automated methods for cell type annotation on scRNA-seq data.

Comput Struct Biotechnol J. 2021 Jan 19;19:961-969. doi: 10.1016/j.csbj.2021.01.015. eCollection 2021.

Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models.

Mol Syst Biol. 2021 Jan;17(1):e9620. doi: 10.15252/msb.20209620.

Author Correction: Systematic comparison of single-cell and single-nucleus RNA-sequencing methods.

Nat Biotechnol. 2020 Jun;38(6):756. doi: 10.1038/s41587-020-0534-z.

scID Uses Discriminant Analysis to Identify Transcriptionally Equivalent Cell Types across Single-Cell RNA-Seq Data with Batch Effect.

iScience. 2020 Mar 27;23(3):100914. doi: 10.1016/j.isci.2020.100914. Epub 2020 Feb 14.

Supervised classification enables rapid annotation of cell atlases.

Nat Methods. 2019 Oct;16(10):983-986. doi: 10.1038/s41592-019-0535-3. Epub 2019 Sep 9.

A comparison of automatic cell identification methods for single-cell RNA sequencing data.

Genome Biol. 2019 Sep 9;20(1):194. doi: 10.1186/s13059-019-1795-z.

SingleCellNet: A Computational Tool to Classify Single Cell RNA-Seq Data Across Platforms and Across Species.

Cell Syst. 2019 Aug 28;9(2):207-213.e2. doi: 10.1016/j.cels.2019.06.004. Epub 2019 Jul 31.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PCLDA：一种基于简单统计方法的用于单细胞RNA测序数据的可解释细胞注释工具。

PCLDA: An interpretable cell annotation tool for single-cell RNA-sequencing data based on simple statistical methods.

作者信息

Bai Kailun, Moa Belaid, Shao Xiaojian, Zhang Xuekui

机构信息

Department of Mathematics and Statistics, University of Victoria, Victoria BC, Canada.

Digital Research Alliance of Canada, Victoria BC, Canada.

出版信息

Comput Struct Biotechnol J. 2025 Jul 23;27:3264-3274. doi: 10.1016/j.csbj.2025.07.019. eCollection 2025.

DOI:10.1016/j.csbj.2025.07.019

PMID:40778314

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12329077/

Abstract

摘要

PCLDA：一种基于简单统计方法的用于单细胞RNA测序数据的可解释细胞注释工具。

PCLDA: An interpretable cell annotation tool for single-cell RNA-sequencing data based on simple statistical methods.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

PCLDA：一种基于简单统计方法的用于单细胞RNA测序数据的可解释细胞注释工具。

PCLDA: An interpretable cell annotation tool for single-cell RNA-sequencing data based on simple statistical methods.

作者信息

机构信息

出版信息

相似文献

本文引用的文献