FRMC：一种用于 scRNA-seq 数据插补的快速而稳健的方法。

FRMC: a fast and robust method for the imputation of scRNA-seq data.

机构信息

Wuhan National Laboratory for Optoelectronics, Huazhong University of Science & Technology, Wuhan, Hubei, China.

BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China.

出版信息

RNA Biol. 2021 Oct 15;18(sup1):172-181. doi: 10.1080/15476286.2021.1960688. Epub 2021 Aug 30.

DOI:10.1080/15476286.2021.1960688

PMID:34459719

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8682979/

Abstract

The high-resolution feature of single-cell transcriptome sequencing technology allows researchers to observe cellular gene expression profiles at the single-cell level, offering numerous possibilities for subsequent biomedical investigation. However, the unavoidable technical impact of high missing values in the gene-cell expression matrices generated by insufficient RNA input severely hampers the accuracy of downstream analysis. To address this problem, it is essential to develop a more rapid and stable imputation method with greater accuracy, which should not only be able to recover the missing data, but also effectively facilitate the following biological mechanism analysis. The existing imputation methods all have their drawbacks and limitations, some require pre-assumed data distribution, some cannot distinguish between technical and biological zeros, and some have poor computational performance. In this paper, we presented a novel imputation software FRMC for single-cell RNA-Seq data, which innovates a fast and accurate singular value thresholding approximation method. The experiments demonstrated that FRMC can not only precisely distinguish 'true zeros' from dropout events and correctly impute missing values attributed to technical noises, but also effectively enhance intracellular and intergenic connections and achieve accurate clustering of cells in biological applications. In summary, FRMC can be a powerful tool for analysing single-cell data because it ensures biological significance, accuracy, and rapidity simultaneously. FRMC is implemented in Python and is freely accessible to non-commercial users on GitHub: https://github.com/HUST-DataMan/FRMC.

摘要

单细胞转录组测序技术的高分辨率特征使研究人员能够在单细胞水平观察细胞基因表达谱，为随后的生物医学研究提供了许多可能性。然而，由于 RNA 输入不足而产生的基因-细胞表达矩阵中不可避免的高缺失值的技术影响，严重阻碍了下游分析的准确性。为了解决这个问题，开发一种更快速、更稳定、更准确的插补方法至关重要，这种方法不仅要能够恢复缺失的数据，还要有效地促进后续的生物学机制分析。现有的插补方法都有其缺点和局限性，有些需要预先假设数据分布，有些不能区分技术零和生物学零，有些计算性能较差。在本文中，我们提出了一种新的用于单细胞 RNA-Seq 数据的插补软件 FRMC，该软件创新了一种快速准确的奇异值阈值逼近方法。实验表明，FRMC 不仅可以精确地区分“真正的零”和辍学事件，并正确地插补归因于技术噪声的缺失值，而且可以有效地增强细胞内和基因间的连接，并在生物应用中实现细胞的准确聚类。总之，FRMC 可以成为分析单细胞数据的有力工具，因为它同时确保了生物学意义、准确性和快速性。FRMC 是用 Python 实现的，非商业用户可以在 GitHub 上免费使用：https://github.com/HUST-DataMan/FRMC。

相似文献

FRMC: a fast and robust method for the imputation of scRNA-seq data.FRMC：一种用于 scRNA-seq 数据插补的快速而稳健的方法。

RNA Biol. 2021 Oct 15;18(sup1):172-181. doi: 10.1080/15476286.2021.1960688. Epub 2021 Aug 30.

GE-Impute: graph embedding-based imputation for single-cell RNA-seq data.GE-Impute：基于图嵌入的单细胞 RNA-seq 数据插补。

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac313.

scRMD: imputation for single cell RNA-seq data via robust matrix decomposition.scRMD：基于稳健矩阵分解的单细胞 RNA-seq 数据插补。

Bioinformatics. 2020 May 1;36(10):3156-3161. doi: 10.1093/bioinformatics/btaa139.

CDSImpute: An ensemble similarity imputation method for single-cell RNA sequence dropouts.CDSImpute：一种用于单细胞 RNA 序列缺失的集成相似性插补方法。

Comput Biol Med. 2022 Jul;146:105658. doi: 10.1016/j.compbiomed.2022.105658. Epub 2022 May 21.

TsImpute: an accurate two-step imputation method for single-cell RNA-seq data.TsImpute：一种用于单细胞 RNA-seq 数据的准确两步插补方法。

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad731.

CL-Impute: A contrastive learning-based imputation for dropout single-cell RNA-seq data.CL-Impute：基于对比学习的 dropout 单细胞 RNA-seq 数据插补方法。

Comput Biol Med. 2023 Sep;164:107263. doi: 10.1016/j.compbiomed.2023.107263. Epub 2023 Jul 23.

scIGANs: single-cell RNA-seq imputation using generative adversarial networks.scIGANs：基于生成对抗网络的单细胞 RNA-seq 插补。

Nucleic Acids Res. 2020 Sep 4;48(15):e85. doi: 10.1093/nar/gkaa506.

ccImpute: an accurate and scalable consensus clustering based algorithm to impute dropout events in the single-cell RNA-seq data.ccImpute：一种准确且可扩展的基于共识聚类的算法，用于在单细胞 RNA-seq 数据中推断出缺失事件。

BMC Bioinformatics. 2022 Jul 22;23(1):291. doi: 10.1186/s12859-022-04814-8.

Collaborative Structure-Preserved Missing Data Imputation for Single-Cell RNA-Seq Clustering.单细胞 RNA-Seq 聚类的协作结构保留缺失数据插补。

IEEE/ACM Trans Comput Biol Bioinform. 2024 Sep-Oct;21(5):1480-1491. doi: 10.1109/TCBB.2024.3404013. Epub 2024 Oct 9.

Bubble: a fast single-cell RNA-seq imputation using an autoencoder constrained by bulk RNA-seq data.Bubble：一种利用受批量RNA测序数据约束的自动编码器进行的快速单细胞RNA测序插补方法。

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac580.

本文引用的文献

mbImpute: an accurate and robust imputation method for microbiome data.mbImpute：一种准确且稳健的微生物组数据插补方法。

Genome Biol. 2021 Jun 28;22(1):192. doi: 10.1186/s13059-021-02400-4.

2DImpute: imputation in single-cell RNA-seq data from correlations in two dimensions.2DImpute：基于二维相关性的单细胞 RNA-seq 数据插补。

Bioinformatics. 2020 Jun 1;36(11):3588-3589. doi: 10.1093/bioinformatics/btaa148.

bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data.bayNorm：用于单细胞 RNA-seq 数据的贝叶斯基因表达恢复、插补和标准化。

Bioinformatics. 2020 Feb 15;36(4):1174-1181. doi: 10.1093/bioinformatics/btz726.

McImpute: Matrix Completion Based Imputation for Single Cell RNA-seq Data.McImpute：基于矩阵填充的单细胞RNA测序数据插补方法

Front Genet. 2019 Jan 29;10:9. doi: 10.3389/fgene.2019.00009. eCollection 2019.

VIPER: variability-preserving imputation for accurate gene expression recovery in single-cell RNA sequencing studies.VIPER：单细胞 RNA 测序研究中用于准确基因表达恢复的保留变异性的插补。

Genome Biol. 2018 Nov 12;19(1):196. doi: 10.1186/s13059-018-1575-1.

A Cancer Cell Program Promotes T Cell Exclusion and Resistance to Checkpoint Blockade.肿瘤细胞程序性死亡配体 1 表达促进 T 细胞排除和对检查点阻断的抵抗

Cell. 2018 Nov 1;175(4):984-997.e24. doi: 10.1016/j.cell.2018.09.006.

Single-Cell Map of Diverse Immune Phenotypes in the Breast Tumor Microenvironment.乳腺肿瘤微环境中多样化免疫表型的单细胞图谱

Cell. 2018 Aug 23;174(5):1293-1308.e36. doi: 10.1016/j.cell.2018.05.060. Epub 2018 Jun 28.

Recovering Gene Interactions from Single-Cell Data Using Data Diffusion.利用数据扩散从单细胞数据中恢复基因相互作用。

Cell. 2018 Jul 26;174(3):716-729.e27. doi: 10.1016/j.cell.2018.05.061. Epub 2018 Jun 28.

SAVER: gene expression recovery for single-cell RNA sequencing.SAVER：单细胞 RNA 测序的基因表达恢复。

Nat Methods. 2018 Jul;15(7):539-542. doi: 10.1038/s41592-018-0033-z. Epub 2018 Jun 25.

Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data.用于校正单细胞基因表达数据中技术变异的狄利克雷过程混合模型

JMLR Workshop Conf Proc. 2016;48:1070-1079.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验