评估单细胞 RNA-seq 数据的插补方法。

Evaluating imputation methods for single-cell RNA-seq data.

机构信息

School of Intelligence Science and Technology, Key Laboratory of Machine Perception (MOE), Peking University, Beijing, 100871, China.

Department of Immunology, NHC Key Laboratory of Medical Immunology (Peking University), School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China.

出版信息

BMC Bioinformatics. 2023 Jul 28;24(1):302. doi: 10.1186/s12859-023-05417-7.

DOI:10.1186/s12859-023-05417-7

PMID:37507764

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10386301/

Abstract

BACKGROUND

Single-cell RNA sequencing (scRNA-seq) enables the high-throughput profiling of gene expression at the single-cell level. However, overwhelming dropouts within data may obscure meaningful biological signals. Various imputation methods have recently been developed to address this problem. Therefore, it is important to perform a systematic evaluation of different imputation algorithms.

RESULTS

In this study, we evaluated 11 of the most recent imputation methods on 12 real biological datasets from immunological studies and 4 simulated datasets. The performance of these methods was compared, based on numerical recovery, cell clustering and marker gene analysis. Most of the methods brought some benefits on numerical recovery. To some extent, the performance of imputation methods varied among protocols. In the cell clustering analysis, no method performed consistently well across all datasets. Some methods performed poorly on real datasets but excellent on simulated datasets. Surprisingly and importantly, some methods had a negative effect on cell clustering. In marker gene analysis, some methods identified potentially novel cell subsets. However, not all of the marker genes were successfully imputed in gene expression, suggesting that imputation challenges remain.

CONCLUSIONS

In summary, different imputation methods showed different effects on different datasets, suggesting that imputation may have dataset specificity. Our study reveals the benefits and limitations of various imputation methods and provides a data-driven guidance for scRNA-seq data analysis.

摘要

背景

单细胞 RNA 测序（scRNA-seq）能够在单细胞水平上高通量地分析基因表达。然而，数据中大量的缺失值可能会掩盖有意义的生物学信号。最近已经开发了各种插补方法来解决这个问题。因此，对不同的插补算法进行系统评估是很重要的。

结果

在这项研究中，我们在 12 个来自免疫学研究的真实生物数据集和 4 个模拟数据集上评估了 11 种最新的插补方法。根据数值恢复、细胞聚类和标记基因分析，比较了这些方法的性能。大多数方法在数值恢复方面都有一定的优势。在某种程度上，插补方法的性能在不同的方案中有所不同。在细胞聚类分析中，没有一种方法在所有数据集上都表现得一致良好。一些方法在真实数据集上表现不佳，但在模拟数据集上表现出色。令人惊讶的是，一些方法对细胞聚类有负面影响。在标记基因分析中，一些方法鉴定出了潜在的新的细胞亚群。然而，并非所有的标记基因都能成功地在基因表达中进行插补，这表明插补仍然存在挑战。

结论

总之，不同的插补方法对不同的数据集有不同的影响，这表明插补可能具有数据集特异性。我们的研究揭示了各种插补方法的优缺点，并为 scRNA-seq 数据分析提供了数据驱动的指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7eea/10386301/5d45ad3879d3/12859_2023_5417_Fig1_HTML.jpg

相似文献

Evaluating imputation methods for single-cell RNA-seq data.评估单细胞 RNA-seq 数据的插补方法。

BMC Bioinformatics. 2023 Jul 28;24(1):302. doi: 10.1186/s12859-023-05417-7.

SSNMDI: a novel joint learning model of semi-supervised non-negative matrix factorization and data imputation for clustering of single-cell RNA-seq data.SSNMDI：一种用于单细胞 RNA-seq 数据聚类的半监督非负矩阵分解和数据插补的新型联合学习模型。

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad149.

Evaluating the performance of dropout imputation and clustering methods for single-cell RNA sequencing data.评估 dropout 插补和聚类方法在单细胞 RNA 测序数据中的性能。

Comput Biol Med. 2022 Jul;146:105697. doi: 10.1016/j.compbiomed.2022.105697. Epub 2022 Jun 8.

Accurate and interpretable gene expression imputation on scRNA-seq data using IGSimpute.使用 IGSimpute 实现 scRNA-seq 数据的准确和可解释的基因表达推断。

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad124.

Propensity score matching enables batch-effect-corrected imputation in single-cell RNA-seq analysis.倾向得分匹配可实现单细胞 RNA-seq 分析中的批次效应校正填补。

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac275.

CL-Impute: A contrastive learning-based imputation for dropout single-cell RNA-seq data.CL-Impute：基于对比学习的 dropout 单细胞 RNA-seq 数据插补方法。

Comput Biol Med. 2023 Sep;164:107263. doi: 10.1016/j.compbiomed.2023.107263. Epub 2023 Jul 23.

Collaborative Structure-Preserved Missing Data Imputation for Single-Cell RNA-Seq Clustering.单细胞 RNA-Seq 聚类的协作结构保留缺失数据插补。

IEEE/ACM Trans Comput Biol Bioinform. 2024 Sep-Oct;21(5):1480-1491. doi: 10.1109/TCBB.2024.3404013. Epub 2024 Oct 9.

GE-Impute: graph embedding-based imputation for single-cell RNA-seq data.GE-Impute：基于图嵌入的单细胞 RNA-seq 数据插补。

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac313.

Bubble: a fast single-cell RNA-seq imputation using an autoencoder constrained by bulk RNA-seq data.Bubble：一种利用受批量RNA测序数据约束的自动编码器进行的快速单细胞RNA测序插补方法。

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac580.

A systematic evaluation of single-cell RNA-sequencing imputation methods.单细胞 RNA-seq 数据插补方法的系统评价

Genome Biol. 2020 Aug 27;21(1):218. doi: 10.1186/s13059-020-02132-x.

引用本文的文献

Missing data in single-cell transcriptomes reveals transcriptional shifts.单细胞转录组中的缺失数据揭示了转录变化。

bioRxiv. 2025 Aug 21:2025.08.15.669765. doi: 10.1101/2025.08.15.669765.

scTsI: an effective two-stage imputation method for single-cell RNA-seq data.scTsI：一种用于单细胞RNA测序数据的有效两阶段插补方法。

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf298.

An in-depth benchmark framework for evaluating single cell RNA-seq dropout imputation methods and the development of an improved algorithm afMF.一种用于评估单细胞RNA测序缺失值插补方法的深入基准框架以及改进算法afMF的开发。

Clin Transl Med. 2025 Apr;15(4):e70283. doi: 10.1002/ctm2.70283.

A graph neural network that combines scRNA-seq and protein-protein interaction data.一种结合单细胞RNA测序（scRNA-seq）和蛋白质-蛋白质相互作用数据的图神经网络。

Nat Methods. 2025 Apr;22(4):660-661. doi: 10.1038/s41592-025-02628-z.

scNET: learning context-specific gene and cell embeddings by integrating single-cell gene expression data with protein-protein interactions.scNET：通过整合单细胞基因表达数据与蛋白质-蛋白质相互作用来学习特定背景下的基因和细胞嵌入

Nat Methods. 2025 Apr;22(4):708-716. doi: 10.1038/s41592-025-02627-0. Epub 2025 Mar 17.

PhyImpute and UniFracImpute: two imputation approaches incorporating phylogeny information for microbial count data.PhyImpute和UniFracImpute：两种纳入系统发育信息以处理微生物计数数据的插补方法。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae653.

Conserved transcription factors coordinate synaptic gene expression through repression.保守的转录因子通过抑制作用协调突触基因表达。

bioRxiv. 2025 Feb 11:2024.10.30.621128. doi: 10.1101/2024.10.30.621128.

scTCA: a hybrid Transformer-CNN architecture for imputation and denoising of scDNA-seq data.scTCA：一种用于 scDNA-seq 数据插补和去噪的混合 Transformer-CNN 架构。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae577.

SAE-Impute: imputation for single-cell data via subspace regression and auto-encoders.SAE-Impute：通过子空间回归和自动编码器对单细胞数据进行插补

BMC Bioinformatics. 2024 Oct 1;25(1):317. doi: 10.1186/s12859-024-05944-x.

γδ T-cells in human malignancies: insights from single-cell studies and analytical considerations.人类恶性肿瘤中的γδ T细胞：单细胞研究的见解与分析考量

Front Immunol. 2024 Aug 30;15:1438962. doi: 10.3389/fimmu.2024.1438962. eCollection 2024.

本文引用的文献

A theoretical framework of immune cell phenotypic classification and discovery.免疫细胞表型分类和发现的理论框架。

Front Immunol. 2023 Mar 2;14:1128423. doi: 10.3389/fimmu.2023.1128423. eCollection 2023.

Benchmarking single-cell RNA-sequencing protocols for cell atlas projects.单细胞 RNA 测序技术在细胞图谱项目中的基准测试。

Nat Biotechnol. 2020 Jun;38(6):747-755. doi: 10.1038/s41587-020-0469-4. Epub 2020 Apr 6.

Transcriptional Basis of Mouse and Human Dendritic Cell Heterogeneity.转录基础上的小鼠和人类树突状细胞异质性。

Cell. 2019 Oct 31;179(4):846-863.e24. doi: 10.1016/j.cell.2019.09.035. Epub 2019 Oct 24.

A systematic evaluation of single cell RNA-seq analysis pipelines.单细胞 RNA 测序分析流程的系统评价。

Nat Commun. 2019 Oct 11;10(1):4667. doi: 10.1038/s41467-019-12266-7.

Exploring single-cell data with deep multitasking neural networks.用深度多任务神经网络探索单细胞数据。

Nat Methods. 2019 Nov;16(11):1139-1145. doi: 10.1038/s41592-019-0576-7. Epub 2019 Oct 7.

Clonal replacement of tumor-specific T cells following PD-1 blockade.PD-1 阻断后肿瘤特异性 T 细胞的克隆性替换。

Nat Med. 2019 Aug;25(8):1251-1259. doi: 10.1038/s41591-019-0522-3. Epub 2019 Jul 29.

Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments.使用混合对照实验对标单细胞 RNA 测序分析流程。

Nat Methods. 2019 Jun;16(6):479-487. doi: 10.1038/s41592-019-0425-8. Epub 2019 May 27.

False signals induced by single-cell imputation.单细胞插补诱导的假信号。

F1000Res. 2018 Nov 2;7:1740. doi: 10.12688/f1000research.16613.2. eCollection 2018.

Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning.利用深度循环神经网络对单细胞转录组学数据进行可扩展的细胞类型组成分析。

Nat Methods. 2019 Apr;16(4):311-314. doi: 10.1038/s41592-019-0353-7. Epub 2019 Mar 18.

Lymphocyte innateness defined by transcriptional states reflects a balance between proliferation and effector functions.淋巴细胞的先天状态由转录状态定义，反映了增殖和效应功能之间的平衡。

Nat Commun. 2019 Feb 8;10(1):687. doi: 10.1038/s41467-019-08604-4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估单细胞 RNA-seq 数据的插补方法。

Evaluating imputation methods for single-cell RNA-seq data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献