单细胞 RNA 测序数据中标记基因选择方法的比较。

A comparison of marker gene selection methods for single-cell RNA sequencing data.

机构信息

Bioinformatics and Cellular Genomics, St Vincent's Institute of Medical Research, 9 Princes St, Fitzroy, 3065, VIC, Australia.

School of Mathematics and Statistics, University of Melbourne, Parkville, 3010, VIC, Australia.

出版信息

Genome Biol. 2024 Feb 26;25(1):56. doi: 10.1186/s13059-024-03183-0.

DOI:10.1186/s13059-024-03183-0

PMID:38409056

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10895860/

Abstract

BACKGROUND

The development of single-cell RNA sequencing (scRNA-seq) has enabled scientists to catalog and probe the transcriptional heterogeneity of individual cells in unprecedented detail. A common step in the analysis of scRNA-seq data is the selection of so-called marker genes, most commonly to enable annotation of the biological cell types present in the sample. In this paper, we benchmark 59 computational methods for selecting marker genes in scRNA-seq data.

RESULTS

We compare the performance of the methods using 14 real scRNA-seq datasets and over 170 additional simulated datasets. Methods are compared on their ability to recover simulated and expert-annotated marker genes, the predictive performance and characteristics of the gene sets they select, their memory usage and speed, and their implementation quality. In addition, various case studies are used to scrutinize the most commonly used methods, highlighting issues and inconsistencies.

CONCLUSIONS

Overall, we present a comprehensive evaluation of methods for selecting marker genes in scRNA-seq data. Our results highlight the efficacy of simple methods, especially the Wilcoxon rank-sum test, Student's t-test, and logistic regression.

摘要

背景

单细胞 RNA 测序 (scRNA-seq) 的发展使科学家能够以前所未有的细节对单个细胞的转录异质性进行编目和探测。scRNA-seq 数据分析的一个常见步骤是选择所谓的标记基因，最常见的是能够注释样本中存在的生物细胞类型。在本文中，我们基准测试了 59 种用于选择 scRNA-seq 数据中标记基因的计算方法。

结果

我们使用 14 个真实的 scRNA-seq 数据集和 170 多个额外的模拟数据集比较了方法的性能。方法的比较基于它们恢复模拟和专家注释的标记基因的能力、它们选择的基因集的预测性能和特征、它们的内存使用情况和速度以及它们的实现质量。此外，还使用了各种案例研究来仔细检查最常用的方法，突出了问题和不一致之处。

结论

总的来说，我们对 scRNA-seq 数据中选择标记基因的方法进行了全面评估。我们的结果突出了简单方法的功效，特别是 Wilcoxon 秩和检验、Student's t 检验和逻辑回归。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c651/10895860/32741f9c4524/13059_2024_3183_Fig1_HTML.jpg

相似文献

A comparison of marker gene selection methods for single-cell RNA sequencing data.

Genome Biol. 2024 Feb 26;25(1):56. doi: 10.1186/s13059-024-03183-0.

scMAGS: Marker gene selection from scRNA-seq data for spatial transcriptomics studies.

Comput Biol Med. 2023 Mar;155:106634. doi: 10.1016/j.compbiomed.2023.106634. Epub 2023 Feb 9.

Single-Cell RNA Sequencing Analysis: A Step-by-Step Overview.

Methods Mol Biol. 2021;2284:343-365. doi: 10.1007/978-1-0716-1307-8_19.

A rank-based marker selection method for high throughput scRNA-seq data.

BMC Bioinformatics. 2020 Oct 23;21(1):477. doi: 10.1186/s12859-020-03641-z.

Detection of cell markers from single cell RNA-seq with sc2marker.

BMC Bioinformatics. 2022 Jul 12;23(1):276. doi: 10.1186/s12859-022-04817-5.

A Comprehensive Survey of Statistical Approaches for Differential Expression Analysis in Single-Cell RNA Sequencing Studies.

Genes (Basel). 2021 Dec 2;12(12):1947. doi: 10.3390/genes12121947.

Detection of high variability in gene expression from single-cell RNA-seq profiling.

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):508. doi: 10.1186/s12864-016-2897-6.

Evaluation of single-cell classifiers for single-cell RNA sequencing data sets.

Brief Bioinform. 2020 Sep 25;21(5):1581-1595. doi: 10.1093/bib/bbz096.

Evaluation of Cell Type Annotation R Packages on Single-cell RNA-seq Data.

Genomics Proteomics Bioinformatics. 2021 Apr;19(2):267-281. doi: 10.1016/j.gpb.2020.07.004. Epub 2020 Dec 24.

Mcadet: A feature selection method for fine-resolution single-cell RNA-seq data based on multiple correspondence analysis and community detection.

PLoS Comput Biol. 2024 Oct 28;20(10):e1012560. doi: 10.1371/journal.pcbi.1012560. eCollection 2024 Oct.

引用本文的文献

Identification of pathogenic cell types and shared genetic loci and genes for Alzheimer's disease and inflammatory bowel disease.

Brief Funct Genomics. 2025 Jan 15;24. doi: 10.1093/bfgp/elaf013.

Expanding canonical cortical cell type markers in the era of single-cell transcriptomics.

bioRxiv. 2025 Aug 26:2025.08.26.672469. doi: 10.1101/2025.08.26.672469.

scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis.

bioRxiv. 2025 Aug 23:2023.12.07.569910. doi: 10.1101/2023.12.07.569910.

Biomaterial-mediated Cell Atlas: an insight from single-cell and spatial transcriptomics.

Bioact Mater. 2025 Aug 8;54:1-33. doi: 10.1016/j.bioactmat.2025.07.047. eCollection 2025 Dec.

gSELECT: A novel pre-analysis machine-learning library enabling early hypothesis testing and predictive gene selection in single-cell data.

Comput Struct Biotechnol J. 2025 Aug 5;27:3510-3527. doi: 10.1016/j.csbj.2025.07.047. eCollection 2025.

scDCT: a conditional diffusion-based deep learning model for high-fidelity single-cell cross-modality translation.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf400.

: a superfast and scalable single-cell RNA-seq data analysis pipeline powered by GPU.

Bioinform Adv. 2025 Jul 17;5(1):vbaf167. doi: 10.1093/bioadv/vbaf167. eCollection 2025.

GUIDING CLUSTERING AND ANNOTATION IN SINGLE-CELL RNA SEQUENCING USING THE AVERAGE OVERLAP METRIC.

bioRxiv. 2025 May 10:2025.05.06.652497. doi: 10.1101/2025.05.06.652497.

Thioredoxin regulates T cell proliferation and aggravates the severity of influenza a virus infection.

Sci Rep. 2025 Jul 8;15(1):24419. doi: 10.1038/s41598-025-10676-w.

DiabetesOmic: A comprehensive multi-omics diabetes database.

Comput Struct Biotechnol J. 2025 May 9;27:2147-2154. doi: 10.1016/j.csbj.2025.05.008. eCollection 2025.

本文引用的文献

Selective Inference for Hierarchical Clustering.

J Am Stat Assoc. 2024;119(545):332-342. doi: 10.1080/01621459.2022.2116331. Epub 2022 Oct 11.

Uncovering cell identity through differential stability with Cepo.

Nat Comput Sci. 2021 Dec;1(12):784-790. doi: 10.1038/s43588-021-00172-2. Epub 2021 Dec 20.

Automatic cell-type harmonization and integration across Human Cell Atlas datasets.

Cell. 2023 Dec 21;186(26):5876-5891.e20. doi: 10.1016/j.cell.2023.11.026.

The shaky foundations of simulating single-cell RNA sequencing data.

Genome Biol. 2023 Mar 29;24(1):62. doi: 10.1186/s13059-023-02904-1.

SMaSH: a scalable, general marker gene identification framework for single-cell RNA-sequencing.

BMC Bioinformatics. 2022 Aug 8;23(1):328. doi: 10.1186/s12859-022-04860-2.

Cross-tissue immune cell analysis reveals tissue-specific features in humans.

Science. 2022 May 13;376(6594):eabl5197. doi: 10.1126/science.abl5197.

Accurate and fast cell marker gene identification with COSG.

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab579.

Cell2location maps fine-grained cell types in spatial transcriptomics.

Nat Biotechnol. 2022 May;40(5):661-671. doi: 10.1038/s41587-021-01139-4. Epub 2022 Jan 13.

Local and systemic responses to SARS-CoV-2 infection in children and adults.

Nature. 2022 Feb;602(7896):321-327. doi: 10.1038/s41586-021-04345-x. Epub 2021 Dec 22.

Over 1000 tools reveal trends in the single-cell RNA-seq analysis landscape.

Genome Biol. 2021 Oct 29;22(1):301. doi: 10.1186/s13059-021-02519-4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

单细胞 RNA 测序数据中标记基因选择方法的比较。

A comparison of marker gene selection methods for single-cell RNA sequencing data.

机构信息

Bioinformatics and Cellular Genomics, St Vincent's Institute of Medical Research, 9 Princes St, Fitzroy, 3065, VIC, Australia.

School of Mathematics and Statistics, University of Melbourne, Parkville, 3010, VIC, Australia.