Suppr超能文献

利用辅助信息和集成学习进行单细胞RNA测序中的相关性插补

Correlation Imputation in Single cell RNA-seq using Auxiliary Information and Ensemble Learning.

作者信息

Gan Luqin, Vinci Giuseppe, Allen Genevera I

机构信息

Rice University.

University of Notre Dame.

出版信息

ACM BCB. 2020 Sep;2020. doi: 10.1145/3388440.3412462.

Abstract

Single cell RNA sequencing is a powerful technique that measures the gene expression of individual cells in a high throughput fashion. However, due to sequencing inefficiency, the data is unreliable due to dropout events, or technical artifacts where genes erroneously appear to have zero expression. Many data imputation methods have been proposed to alleviate this issue. Yet, effective imputation can be difficult and biased because the data is sparse and high-dimensional, resulting in major distortions in downstream analyses. In this paper, we propose a completely novel approach that imputes the gene-by-gene correlations rather than the data itself. We call this method SCENA: Single cell RNA-seq Correlation completion by ENsemble learning and Auxiliary information. The SCENA gene-by-gene correlation matrix estimate is obtained by model stacking of multiple imputed correlation matrices based on known auxiliary information about gene connections. In an extensive simulation study based on real scRNA-seq data, we demonstrate that SCENA not only accurately imputes gene correlations but also outperforms existing imputation approaches in downstream analyses such as dimension reduction, cell clustering, graphical model estimation.

摘要

单细胞RNA测序是一项强大的技术,它以高通量方式测量单个细胞的基因表达。然而,由于测序效率低下,数据因缺失事件或技术假象(即基因错误地显示为零表达)而不可靠。已经提出了许多数据插补方法来缓解这个问题。然而,有效的插补可能很困难且存在偏差,因为数据是稀疏且高维的,这会导致下游分析出现重大失真。在本文中,我们提出了一种全新的方法,该方法插补的是逐个基因的相关性而非数据本身。我们将此方法称为SCENA:通过集成学习和辅助信息完成单细胞RNA测序相关性。SCENA逐个基因的相关矩阵估计是通过基于关于基因连接的已知辅助信息对多个插补相关矩阵进行模型堆叠而获得的。在基于真实单细胞RNA测序数据的广泛模拟研究中,我们证明SCENA不仅能准确插补基因相关性,而且在下游分析(如降维、细胞聚类、图形模型估计)中也优于现有的插补方法。

相似文献

2
Correlation Imputation for Single-Cell RNA-seq.单细胞 RNA-seq 的关联插补。
J Comput Biol. 2022 May;29(5):465-482. doi: 10.1089/cmb.2021.0403. Epub 2022 Mar 21.
3
EnTSSR: A Weighted Ensemble Learning Method to Impute Single-Cell RNA Sequencing Data.EnTSSR:一种用于单细胞 RNA 测序数据插补的加权集成学习方法。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2781-2787. doi: 10.1109/TCBB.2021.3110850. Epub 2021 Dec 8.

引用本文的文献

1
Graphical Model Inference with Erosely Measured Data.含粗略测量数据的图形模型推理
J Am Stat Assoc. 2024;119(547):2282-2293. doi: 10.1080/01621459.2023.2256503. Epub 2023 Oct 20.

本文引用的文献

7
SAVER: gene expression recovery for single-cell RNA sequencing.SAVER:单细胞 RNA 测序的基因表达恢复。
Nat Methods. 2018 Jul;15(7):539-542. doi: 10.1038/s41592-018-0033-z. Epub 2018 Jun 25.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验