• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PCLDA:一种基于简单统计方法的用于单细胞RNA测序数据的可解释细胞注释工具。

PCLDA: An interpretable cell annotation tool for single-cell RNA-sequencing data based on simple statistical methods.

作者信息

Bai Kailun, Moa Belaid, Shao Xiaojian, Zhang Xuekui

机构信息

Department of Mathematics and Statistics, University of Victoria, Victoria BC, Canada.

Digital Research Alliance of Canada, Victoria BC, Canada.

出版信息

Comput Struct Biotechnol J. 2025 Jul 23;27:3264-3274. doi: 10.1016/j.csbj.2025.07.019. eCollection 2025.

DOI:10.1016/j.csbj.2025.07.019
PMID:40778314
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12329077/
Abstract

Single-cell RNA sequencing (scRNA-seq) enables high-resolution analysis of cellular heterogeneity, yet accurate and consistent cell-type annotation remains a crucial challenge. Numerous automated tools exist, but their complex modeling assumptions can hinder reliability across varied datasets and protocols. We propose PCLDA, a pipeline composed of three modules: t-test-based gene screening, principal component analysis (PCA) and linear discriminant analysis (LDA), all built on simple statistical methods. An ablation study shows that each module in PCLDA contributes significantly to performance and robustness, with two novel enhancements in the second module yielding substantial gains. Despite these additions, the model retains its original assumptions, computational efficiency, and interpretability. Benchmarking against nine state-of-the-art methods across 22 public scRNA-seq datasets and 35 distinct evaluation scenarios, PCLDA consistently achieves top-tier accuracy under both intra-dataset (cross-validation) and inter-dataset (cross-platform) conditions. Notably, when reference and query data are generated via different protocols, PCLDA remains stable and often outperforms more complex machine-learning approaches. Furthermore, PCLDA offers strong interpretability, attributed to the linear nature of its PCA and LDA modules. The final decision boundaries are linear combinations of the original gene expression values, directly reflecting the contribution of each gene to the classification. Top-weighted genes identified by PCLDA better capture biologically meaningful signals in enrichment analyses than those selected via marginal screening alone, offering deeper functional insights into cell-type specificity. In conclusion, our work underscores the utility of carefully enhanced simple statistics methods for single-cell annotation. PCLDA's simplicity, interpretability, and consistently high performance make it a practical, reliable alternative to more complex annotation pipelines. Code is available on GitHub:https://github.com/kellen8hao/PCLDA.

摘要

单细胞RNA测序(scRNA-seq)能够对细胞异质性进行高分辨率分析,但准确且一致的细胞类型注释仍然是一项关键挑战。虽然存在许多自动化工具,但其复杂的建模假设可能会妨碍在不同数据集和实验方案中的可靠性。我们提出了PCLDA,这是一个由三个模块组成的流程:基于t检验的基因筛选、主成分分析(PCA)和线性判别分析(LDA),所有这些都建立在简单的统计方法之上。一项消融研究表明,PCLDA中的每个模块对性能和稳健性都有显著贡献,第二个模块中的两项新颖改进带来了显著提升。尽管有这些改进,该模型仍保留其原始假设、计算效率和可解释性。在22个公共scRNA-seq数据集和35种不同评估场景下与九种先进方法进行基准测试时,PCLDA在数据集内(交叉验证)和数据集间(跨平台)条件下均始终实现顶级准确性。值得注意的是,当参考数据和查询数据通过不同实验方案生成时,PCLDA保持稳定,并且通常优于更复杂的机器学习方法。此外,由于其PCA和LDA模块的线性性质,PCLDA具有很强的可解释性。最终的决策边界是原始基因表达值的线性组合,直接反映了每个基因对分类的贡献。与仅通过边际筛选选择的基因相比,PCLDA识别出的权重最高的基因在富集分析中能更好地捕捉生物学上有意义的信号,从而为细胞类型特异性提供更深入的功能见解。总之,我们的工作强调了精心改进的简单统计方法在单细胞注释中的实用性。PCLDA的简单性、可解释性和始终如一的高性能使其成为更复杂注释流程的实用、可靠替代方案。代码可在GitHub上获取:https://github.com/kellen8hao/PCLDA。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4a0/12329077/a07554e73ad2/gr007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4a0/12329077/f35870ca791e/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4a0/12329077/9f53f07d0441/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4a0/12329077/d3d46bc8acc4/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4a0/12329077/e97a2401e0af/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4a0/12329077/41b57803eebd/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4a0/12329077/97b7dd648892/gr006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4a0/12329077/a07554e73ad2/gr007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4a0/12329077/f35870ca791e/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4a0/12329077/9f53f07d0441/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4a0/12329077/d3d46bc8acc4/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4a0/12329077/e97a2401e0af/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4a0/12329077/41b57803eebd/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4a0/12329077/97b7dd648892/gr006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4a0/12329077/a07554e73ad2/gr007.jpg

相似文献

1
PCLDA: An interpretable cell annotation tool for single-cell RNA-sequencing data based on simple statistical methods.PCLDA:一种基于简单统计方法的用于单细胞RNA测序数据的可解释细胞注释工具。
Comput Struct Biotechnol J. 2025 Jul 23;27:3264-3274. doi: 10.1016/j.csbj.2025.07.019. eCollection 2025.
2
Sexual Harassment and Prevention Training性骚扰与预防培训
3
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
4
Leveraging multiple labeled datasets for the automated annotation of single-cell RNA and ATAC data.利用多个标记数据集对单细胞RNA和ATAC数据进行自动注释。
Comput Struct Biotechnol J. 2025 Jul 1;27:2863-2870. doi: 10.1016/j.csbj.2025.06.043. eCollection 2025.
5
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
6
Lightweight cross-resolution coarse-to-fine network for efficient deformable medical image registration.用于高效可变形医学图像配准的轻量级跨分辨率粗到细网络
Med Phys. 2025 Apr 25. doi: 10.1002/mp.17827.
7
A Responsible Framework for Assessing, Selecting, and Explaining Machine Learning Models in Cardiovascular Disease Outcomes Among People With Type 2 Diabetes: Methodology and Validation Study.用于评估、选择和解释2型糖尿病患者心血管疾病结局机器学习模型的责任框架:方法与验证研究
JMIR Med Inform. 2025 Jun 27;13:e66200. doi: 10.2196/66200.
8
scIMGCN: an Automatic Single-Cell Type Annotation Method Based on Interpretable Graph Convolutional Network.scIMGCN:一种基于可解释图卷积网络的自动单细胞类型注释方法。
Interdiscip Sci. 2025 Jul 19. doi: 10.1007/s12539-025-00738-y.
9
Automated devices for identifying peripheral arterial disease in people with leg ulceration: an evidence synthesis and cost-effectiveness analysis.用于识别下肢溃疡患者外周动脉疾病的自动化设备:证据综合和成本效益分析。
Health Technol Assess. 2024 Aug;28(37):1-158. doi: 10.3310/TWCG3912.
10
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

本文引用的文献

1
Single-cell RNA sequencing technologies and applications: A brief overview.单细胞 RNA 测序技术及应用:简述。
Clin Transl Med. 2022 Mar;12(3):e694. doi: 10.1002/ctm2.694.
2
Differentially expressed genes reflect disease-induced rather than disease-causing changes in the transcriptome.差异表达基因反映了转录组中由疾病引起的变化,而不是疾病导致的变化。
Nat Commun. 2021 Sep 24;12(1):5647. doi: 10.1038/s41467-021-25805-y.
3
Integrated analysis of multimodal single-cell data.多模态单细胞数据的综合分析。
Cell. 2021 Jun 24;184(13):3573-3587.e29. doi: 10.1016/j.cell.2021.04.048. Epub 2021 May 31.
4
Automated methods for cell type annotation on scRNA-seq data.单细胞RNA测序(scRNA-seq)数据细胞类型注释的自动化方法。
Comput Struct Biotechnol J. 2021 Jan 19;19:961-969. doi: 10.1016/j.csbj.2021.01.015. eCollection 2021.
5
Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models.使用深度生成模型对单细胞转录组学数据进行概率协调和注释。
Mol Syst Biol. 2021 Jan;17(1):e9620. doi: 10.15252/msb.20209620.
6
Author Correction: Systematic comparison of single-cell and single-nucleus RNA-sequencing methods.作者更正:单细胞和单细胞核RNA测序方法的系统比较
Nat Biotechnol. 2020 Jun;38(6):756. doi: 10.1038/s41587-020-0534-z.
7
scID Uses Discriminant Analysis to Identify Transcriptionally Equivalent Cell Types across Single-Cell RNA-Seq Data with Batch Effect.scID使用判别分析来识别具有批次效应的单细胞RNA测序数据中的转录等效细胞类型。
iScience. 2020 Mar 27;23(3):100914. doi: 10.1016/j.isci.2020.100914. Epub 2020 Feb 14.
8
Supervised classification enables rapid annotation of cell atlases.监督分类可实现细胞图谱的快速标注。
Nat Methods. 2019 Oct;16(10):983-986. doi: 10.1038/s41592-019-0535-3. Epub 2019 Sep 9.
9
A comparison of automatic cell identification methods for single-cell RNA sequencing data.单细胞 RNA 测序数据的自动细胞识别方法比较。
Genome Biol. 2019 Sep 9;20(1):194. doi: 10.1186/s13059-019-1795-z.
10
SingleCellNet: A Computational Tool to Classify Single Cell RNA-Seq Data Across Platforms and Across Species.SingleCellNet:一种跨平台和跨物种对单细胞 RNA-Seq 数据进行分类的计算工具。
Cell Syst. 2019 Aug 28;9(2):207-213.e2. doi: 10.1016/j.cels.2019.06.004. Epub 2019 Jul 31.