利用辅助信息和集成学习进行单细胞RNA测序中的相关性插补

Correlation Imputation in Single cell RNA-seq using Auxiliary Information and Ensemble Learning.

作者信息

Gan Luqin, Vinci Giuseppe, Allen Genevera I

机构信息

Rice University.

University of Notre Dame.

出版信息

ACM BCB. 2020 Sep;2020. doi: 10.1145/3388440.3412462.

DOI:10.1145/3388440.3412462

PMID:34278382

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8281968/

Abstract

Single cell RNA sequencing is a powerful technique that measures the gene expression of individual cells in a high throughput fashion. However, due to sequencing inefficiency, the data is unreliable due to dropout events, or technical artifacts where genes erroneously appear to have zero expression. Many data imputation methods have been proposed to alleviate this issue. Yet, effective imputation can be difficult and biased because the data is sparse and high-dimensional, resulting in major distortions in downstream analyses. In this paper, we propose a completely novel approach that imputes the gene-by-gene correlations rather than the data itself. We call this method SCENA: Single cell RNA-seq Correlation completion by ENsemble learning and Auxiliary information. The SCENA gene-by-gene correlation matrix estimate is obtained by model stacking of multiple imputed correlation matrices based on known auxiliary information about gene connections. In an extensive simulation study based on real scRNA-seq data, we demonstrate that SCENA not only accurately imputes gene correlations but also outperforms existing imputation approaches in downstream analyses such as dimension reduction, cell clustering, graphical model estimation.

摘要

单细胞RNA测序是一项强大的技术，它以高通量方式测量单个细胞的基因表达。然而，由于测序效率低下，数据因缺失事件或技术假象（即基因错误地显示为零表达）而不可靠。已经提出了许多数据插补方法来缓解这个问题。然而，有效的插补可能很困难且存在偏差，因为数据是稀疏且高维的，这会导致下游分析出现重大失真。在本文中，我们提出了一种全新的方法，该方法插补的是逐个基因的相关性而非数据本身。我们将此方法称为SCENA：通过集成学习和辅助信息完成单细胞RNA测序相关性。SCENA逐个基因的相关矩阵估计是通过基于关于基因连接的已知辅助信息对多个插补相关矩阵进行模型堆叠而获得的。在基于真实单细胞RNA测序数据的广泛模拟研究中，我们证明SCENA不仅能准确插补基因相关性，而且在下游分析（如降维、细胞聚类、图形模型估计）中也优于现有的插补方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ad7/8281968/0a7e633c799d/nihms-1715319-f0001.jpg

相似文献

Correlation Imputation in Single cell RNA-seq using Auxiliary Information and Ensemble Learning.利用辅助信息和集成学习进行单细胞RNA测序中的相关性插补

ACM BCB. 2020 Sep;2020. doi: 10.1145/3388440.3412462.

Correlation Imputation for Single-Cell RNA-seq.单细胞 RNA-seq 的关联插补。

J Comput Biol. 2022 May;29(5):465-482. doi: 10.1089/cmb.2021.0403. Epub 2022 Mar 21.

EnTSSR: A Weighted Ensemble Learning Method to Impute Single-Cell RNA Sequencing Data.EnTSSR：一种用于单细胞 RNA 测序数据插补的加权集成学习方法。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2781-2787. doi: 10.1109/TCBB.2021.3110850. Epub 2021 Dec 8.

A flexible network-based imputing-and-fusing approach towards the identification of cell types from single-cell RNA-seq data.一种基于灵活网络的推断融合方法，用于从单细胞 RNA-seq 数据中识别细胞类型。

BMC Bioinformatics. 2020 Jun 11;21(1):240. doi: 10.1186/s12859-020-03547-w.

ScLRTC: imputation for single-cell RNA-seq data via low-rank tensor completion.ScLRTC：基于低秩张量补全的单细胞 RNA-seq 数据插补。

BMC Genomics. 2021 Nov 29;22(1):860. doi: 10.1186/s12864-021-08101-3.

scNPF: an integrative framework assisted by network propagation and network fusion for preprocessing of single-cell RNA-seq data.scNPF：一种基于网络传播和网络融合的综合框架，用于单细胞 RNA-seq 数据的预处理。

BMC Genomics. 2019 May 8;20(1):347. doi: 10.1186/s12864-019-5747-5.

SIMPLEs: a single-cell RNA sequencing imputation strategy preserving gene modules and cell clusters variation.SIMPLEs：一种保留基因模块和细胞簇变异的单细胞RNA测序插补策略。

NAR Genom Bioinform. 2020 Dec;2(4):lqaa077. doi: 10.1093/nargab/lqaa077. Epub 2020 Sep 28.

scIALM: A method for sparse scRNA-seq expression matrix imputation using the Inexact Augmented Lagrange Multiplier with low error.scIALM：一种使用具有低误差的不精确增广拉格朗日乘数法对稀疏单细胞RNA测序表达矩阵进行插补的方法。

Comput Struct Biotechnol J. 2024 Jan 2;23:549-558. doi: 10.1016/j.csbj.2023.12.027. eCollection 2024 Dec.

Are dropout imputation methods for scRNA-seq effective for scHi-C data?单细胞 RNA 测序（scRNA-seq）的缺失值插补方法对 scHi-C 数据有效吗？

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa289.

AutoImpute: Autoencoder based imputation of single-cell RNA-seq data.AutoImpute：基于自动编码器的单细胞 RNA-seq 数据插补。

Sci Rep. 2018 Nov 5;8(1):16329. doi: 10.1038/s41598-018-34688-x.

引用本文的文献

Graphical Model Inference with Erosely Measured Data.含粗略测量数据的图形模型推理

J Am Stat Assoc. 2024;119(547):2282-2293. doi: 10.1080/01621459.2023.2256503. Epub 2023 Oct 20.

本文引用的文献

PRIME: a probabilistic imputation method to reduce dropout effects in single-cell RNA sequencing.PRIME：一种用于减少单细胞 RNA 测序中数据丢失影响的概率插补方法。

Bioinformatics. 2020 Jul 1;36(13):4021-4029. doi: 10.1093/bioinformatics/btaa278.

scRMD: imputation for single cell RNA-seq data via robust matrix decomposition.scRMD：基于稳健矩阵分解的单细胞 RNA-seq 数据插补。

Bioinformatics. 2020 May 1;36(10):3156-3161. doi: 10.1093/bioinformatics/btaa139.

Interferon-Inducible Protein 16 (IFI16) Has a Broad-Spectrum Binding Ability Against ssDNA Targets: An Evolutionary Hypothesis for Antiretroviral Checkpoint.干扰素诱导蛋白16（IFI16）对单链DNA靶标具有广谱结合能力：抗逆转录病毒检查点的进化假说。

Front Microbiol. 2019 Jul 4;10:1426. doi: 10.3389/fmicb.2019.01426. eCollection 2019.

RESCUE: imputing dropout events in single-cell RNA-sequencing data.RESCUE：在单细胞 RNA 测序数据中推断缺失事件。

BMC Bioinformatics. 2019 Jul 12;20(1):388. doi: 10.1186/s12859-019-2977-0.

Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares.通过快速交替最小二乘法实现矩阵补全与低秩奇异值分解

J Mach Learn Res. 2015;16:3367-3402.

A UNIFIED STATISTICAL FRAMEWORK FOR SINGLE CELL AND BULK RNA SEQUENCING DATA.用于单细胞和批量RNA测序数据的统一统计框架

Ann Appl Stat. 2018 Mar;12(1):609-632. doi: 10.1214/17-AOAS1110. Epub 2018 Mar 9.

SAVER: gene expression recovery for single-cell RNA sequencing.SAVER：单细胞 RNA 测序的基因表达恢复。

Nat Methods. 2018 Jul;15(7):539-542. doi: 10.1038/s41592-018-0033-z. Epub 2018 Jun 25.

DrImpute: imputing dropout events in single cell RNA sequencing data.DrImpute：在单细胞 RNA 测序数据中推断缺失事件。

BMC Bioinformatics. 2018 Jun 8;19(1):220. doi: 10.1186/s12859-018-2226-y.

Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain.人类成年大脑中转录和表观遗传状态的综合单细胞分析。

Nat Biotechnol. 2018 Jan;36(1):70-80. doi: 10.1038/nbt.4038. Epub 2017 Dec 11.

Better diagnostic signatures from RNAseq data through use of auxiliary co-data.通过使用辅助共数据从RNA测序数据中获得更好的诊断特征。

Bioinformatics. 2017 May 15;33(10):1572-1574. doi: 10.1093/bioinformatics/btw837.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用辅助信息和集成学习进行单细胞RNA测序中的相关性插补

Correlation Imputation in Single cell RNA-seq using Auxiliary Information and Ensemble Learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献