scMatch：一种基于参考数据集的单细胞基因表达谱注释工具。

scMatch: a single-cell gene expression profile annotation tool using reference datasets.

机构信息

Harry Perkins Institute of Medical Research, QEII Medical Centre and Centre for Medical Research, The University of Western Australia, Nedlands, Perth, WA 6009, Australia.

出版信息

Bioinformatics. 2019 Nov 1;35(22):4688-4695. doi: 10.1093/bioinformatics/btz292.

DOI:10.1093/bioinformatics/btz292

PMID:31028376

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6853649/

Abstract

MOTIVATION

Single-cell RNA sequencing (scRNA-seq) measures gene expression at the resolution of individual cells. Massively multiplexed single-cell profiling has enabled large-scale transcriptional analyses of thousands of cells in complex tissues. In most cases, the true identity of individual cells is unknown and needs to be inferred from the transcriptomic data. Existing methods typically cluster (group) cells based on similarities of their gene expression profiles and assign the same identity to all cells within each cluster using the averaged expression levels. However, scRNA-seq experiments typically produce low-coverage sequencing data for each cell, which hinders the clustering process.

RESULTS

We introduce scMatch, which directly annotates single cells by identifying their closest match in large reference datasets. We used this strategy to annotate various single-cell datasets and evaluated the impacts of sequencing depth, similarity metric and reference datasets. We found that scMatch can rapidly and robustly annotate single cells with comparable accuracy to another recent cell annotation tool (SingleR), but that it is quicker and can handle larger reference datasets. We demonstrate how scMatch can handle large customized reference gene expression profiles that combine data from multiple sources, thus empowering researchers to identify cell populations in any complex tissue with the desired precision.

AVAILABILITY AND IMPLEMENTATION

scMatch (Python code) and the FANTOM5 reference dataset are freely available to the research community here https://github.com/forrest-lab/scMatch.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

单细胞 RNA 测序 (scRNA-seq) 以单个细胞的分辨率测量基因表达。大规模多重单细胞分析使对复杂组织中数千个细胞的大规模转录分析成为可能。在大多数情况下，单个细胞的真实身份未知，需要从转录组数据中推断出来。现有的方法通常根据基因表达谱的相似性对细胞进行聚类（分组），并使用每个聚类中所有细胞的平均表达水平为每个聚类分配相同的身份。然而，scRNA-seq 实验通常为每个细胞产生低覆盖测序数据，这阻碍了聚类过程。

结果

我们引入了 scMatch，它通过在大型参考数据集内识别最接近的匹配来直接注释单细胞。我们使用这种策略注释了各种单细胞数据集，并评估了测序深度、相似性度量和参考数据集的影响。我们发现 scMatch 可以快速而稳健地注释单细胞，其准确性可与另一个最近的细胞注释工具（SingleR）相媲美，但速度更快，并且可以处理更大的参考数据集。我们展示了 scMatch 如何处理大型自定义参考基因表达谱，这些谱结合了来自多个来源的数据，从而使研究人员能够以所需的精度识别任何复杂组织中的细胞群体。

可用性和实现

scMatch（Python 代码）和 FANTOM5 参考数据集可在此处免费提供给研究界：https://github.com/forrest-lab/scMatch。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45a2/6853649/614a75e087a4/btz292f1.jpg

相似文献

scMatch: a single-cell gene expression profile annotation tool using reference datasets.

Bioinformatics. 2019 Nov 1;35(22):4688-4695. doi: 10.1093/bioinformatics/btz292.

Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge.

Bioinformatics. 2018 Jul 1;34(13):i124-i132. doi: 10.1093/bioinformatics/bty293.

DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.

Bioinformatics. 2018 Jan 1;34(1):139-146. doi: 10.1093/bioinformatics/btx490.

Vaeda computationally annotates doublets in single-cell RNA sequencing data.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac720.

scMRA: a robust deep learning method to annotate scRNA-seq data with multiple reference datasets.

Bioinformatics. 2022 Jan 12;38(3):738-745. doi: 10.1093/bioinformatics/btab700.

scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling.

Bioinformatics. 2021 Jul 12;37(Suppl_1):i358-i366. doi: 10.1093/bioinformatics/btab273.

ASAP: a web-based platform for the analysis and interactive visualization of single-cell RNA-seq data.

Bioinformatics. 2017 Oct 1;33(19):3123-3125. doi: 10.1093/bioinformatics/btx337.

VPAC: Variational projection for accurate clustering of single-cell transcriptomic data.

BMC Bioinformatics. 2019 May 1;20(Suppl 7):0. doi: 10.1186/s12859-019-2742-4.

scAnno: a deconvolution strategy-based automatic cell type annotation tool for single-cell RNA-sequencing data sets.

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad179.

Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.

BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):660. doi: 10.1186/s12859-019-3179-5.

引用本文的文献

Adaptive individualized gene pair signatures distinguishing melanoma and predicting response to immune checkpoint blockade.

iScience. 2025 Aug 8;28(9):113329. doi: 10.1016/j.isci.2025.113329. eCollection 2025 Sep 19.

HiCat: a semi-supervised approach for cell type annotation.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf428.

Automated descriptive cell type naming in flow and mass cytometry with CytoPheno.

Sci Rep. 2025 Jul 23;15(1):26750. doi: 10.1038/s41598-025-12153-w.

GUIDING CLUSTERING AND ANNOTATION IN SINGLE-CELL RNA SEQUENCING USING THE AVERAGE OVERLAP METRIC.

bioRxiv. 2025 May 10:2025.05.06.652497. doi: 10.1101/2025.05.06.652497.

ScInfeR: an efficient method for annotating cell types and sub-types in single-cell RNA-seq, ATAC-seq, and spatial omics.

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf253.

scaLR: a low-resource deep neural network-based platform for single cell analysis and biomarker discovery.

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf243.

An overview of computational methods in single-cell transcriptomic cell type annotation.

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf207.

Mapping Cell Identity from scRNA-seq: A primer on computational methods.

Comput Struct Biotechnol J. 2025 Apr 2;27:1559-1569. doi: 10.1016/j.csbj.2025.03.051. eCollection 2025.

CytoPheno: Automated descriptive cell type naming in flow and mass cytometry.

bioRxiv. 2025 Mar 14:2025.03.11.639902. doi: 10.1101/2025.03.11.639902.

SwarmMAP: Swarm Learning for Decentralized Cell Type Annotation in Single Cell Sequencing Data.

bioRxiv. 2025 Jan 16:2025.01.13.632775. doi: 10.1101/2025.01.13.632775.

本文引用的文献

Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage.

Nat Immunol. 2019 Feb;20(2):163-172. doi: 10.1038/s41590-018-0276-y. Epub 2019 Jan 14.

Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris.

Nature. 2018 Oct;562(7727):367-372. doi: 10.1038/s41586-018-0590-4. Epub 2018 Oct 3.

Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data.

F1000Res. 2018 Aug 15;7:1297. doi: 10.12688/f1000research.15809.2. eCollection 2018.

Brief Bioinform. 2019 Nov 27;20(6):2316-2326. doi: 10.1093/bib/bby076.

Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding.

Science. 2018 Apr 13;360(6385):176-182. doi: 10.1126/science.aam8999. Epub 2018 Mar 15.

Reconstruction of complex single-cell trajectories using CellRouter.

Nat Commun. 2018 Mar 1;9(1):892. doi: 10.1038/s41467-018-03214-y.

Mapping the Mouse Cell Atlas by Microwell-Seq.

Cell. 2018 Feb 22;172(5):1091-1107.e17. doi: 10.1016/j.cell.2018.02.001.

Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors.

Nat Genet. 2017 May;49(5):708-718. doi: 10.1038/ng.3818. Epub 2017 Mar 20.

Power analysis of single-cell RNA-sequencing experiments.

Nat Methods. 2017 Apr;14(4):381-387. doi: 10.1038/nmeth.4220. Epub 2017 Mar 6.

Massively parallel digital transcriptional profiling of single cells.

Nat Commun. 2017 Jan 16;8:14049. doi: 10.1038/ncomms14049.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

scMatch：一种基于参考数据集的单细胞基因表达谱注释工具。

scMatch: a single-cell gene expression profile annotation tool using reference datasets.

机构信息

Harry Perkins Institute of Medical Research, QEII Medical Centre and Centre for Medical Research, The University of Western Australia, Nedlands, Perth, WA 6009, Australia.

出版信息

Bioinformatics. 2019 Nov 1;35(22):4688-4695. doi: 10.1093/bioinformatics/btz292.

DOI:10.1093/bioinformatics/btz292

PMID:31028376

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6853649/

Abstract

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

scMatch (Python code) and the FANTOM5 reference dataset are freely available to the research community here https://github.com/forrest-lab/scMatch.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

结果

可用性和实现

scMatch（Python 代码）和 FANTOM5 参考数据集可在此处免费提供给研究界：https://github.com/forrest-lab/scMatch。

补充信息

补充数据可在生物信息学在线获得。

scMatch：一种基于参考数据集的单细胞基因表达谱注释工具。

scMatch: a single-cell gene expression profile annotation tool using reference datasets.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

scMatch：一种基于参考数据集的单细胞基因表达谱注释工具。

scMatch: a single-cell gene expression profile annotation tool using reference datasets.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献