Suppr超能文献

使用 IGSimpute 实现 scRNA-seq 数据的准确和可解释的基因表达推断。

Accurate and interpretable gene expression imputation on scRNA-seq data using IGSimpute.

机构信息

Department of Computer Science, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong.

School of Chinese Medicine, Hong Kong Baptist University, Waterloo Road, Kowloon Tong, Hong Kong.

出版信息

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad124.

Abstract

Single-cell ribonucleic acid sequencing (scRNA-seq) enables the quantification of gene expression at the transcriptomic level with single-cell resolution, enhancing our understanding of cellular heterogeneity. However, the excessive missing values present in scRNA-seq data hinder downstream analysis. While numerous imputation methods have been proposed to recover scRNA-seq data, high imputation performance often comes with low or no interpretability. Here, we present IGSimpute, an accurate and interpretable imputation method for recovering missing values in scRNA-seq data with an interpretable instance-wise gene selection layer (GSL). IGSimpute outperforms 12 other state-of-the-art imputation methods on 13 out of 17 datasets from different scRNA-seq technologies with the lowest mean squared error as the chosen benchmark metric. We demonstrate that IGSimpute can give unbiased estimates of the missing values compared to other methods, regardless of whether the average gene expression values are small or large. Clustering results of imputed profiles show that IGSimpute offers statistically significant improvement over other imputation methods. By taking the heart-and-aorta and the limb muscle tissues as examples, we show that IGSimpute can also denoise gene expression profiles by removing outlier entries with unexpectedly high expression values via the instance-wise GSL. We also show that genes selected by the instance-wise GSL could indicate the age of B cells from bladder fat tissue of the Tabula Muris Senis atlas. IGSimpute can impute one million cells using 64 min, and thus applicable to large datasets.

摘要

单细胞核糖核酸测序(scRNA-seq)能够以单细胞分辨率定量转录组水平的基因表达,从而增强我们对细胞异质性的理解。然而,scRNA-seq 数据中存在大量缺失值,这阻碍了下游分析。虽然已经提出了许多用于恢复 scRNA-seq 数据的插补方法,但高插补性能通常伴随着低或无可解释性。在这里,我们提出了 IGSimpute,这是一种用于恢复 scRNA-seq 数据中缺失值的准确且可解释的插补方法,具有可解释的实例级基因选择层(GSL)。IGSimpute 在来自不同 scRNA-seq 技术的 17 个数据集的 13 个数据集上的 12 种其他最先进的插补方法中表现最佳,选择均方误差作为基准指标。我们证明,与其他方法相比,IGSimpute 可以对缺失值进行无偏估计,而与平均基因表达值的大小无关。插补轮廓的聚类结果表明,IGSimpute 与其他插补方法相比提供了统计学上的显著改进。通过以心脏和主动脉以及肢体肌肉组织为例,我们表明 IGSimpute 还可以通过实例级 GSL 去除具有异常高表达值的异常值条目来对基因表达谱进行去噪。我们还表明,实例级 GSL 选择的基因可以指示 Tabula Muris Senis 图谱中膀胱脂肪组织中 B 细胞的年龄。IGSimpute 可以在 64 分钟内对一百万细胞进行插补,因此适用于大型数据集。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验