Vaeda 对单细胞 RNA 测序数据中的二联体进行了计算注释。

Vaeda computationally annotates doublets in single-cell RNA sequencing data.

机构信息

Department of Developmental Biology, University of Pittsburgh, Pittsburgh, PA 15201, USA.

Canegie Mellon-University of Pittsburgh Joint PhD Program, University of Pittsburgh, Pittsburgh, PA 15201, USA.

出版信息

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac720.

DOI:10.1093/bioinformatics/btac720

PMID:36342203

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9805559/

Abstract

MOTIVATION

Single-cell RNA sequencing (scRNA-seq) continues to expand our knowledge by facilitating the study of transcriptional heterogeneity at the level of single cells. Despite this technology's utility and success in biomedical research, technical artifacts are present in scRNA-seq data. Doublets/multiplets are a type of artifact that occurs when two or more cells are tagged by the same barcode, and therefore they appear as a single cell. Because this introduces non-existent transcriptional profiles, doublets can bias and mislead downstream analysis. To address this limitation, computational methods to annotate and remove doublets form scRNA-seq datasets are needed.

RESULTS

We introduce vaeda (Variational Auto-Encoder for Doublet Annotation), a new approach for computational annotation of doublets in scRNA-seq data. Vaeda integrates a variational auto-encoder and Positive-Unlabeled learning to produce doublet scores and binary doublet calls. We apply vaeda, along with seven existing doublet annotation methods, to 16 benchmark datasets and find that vaeda performs competitively in terms of doublet scores and doublet calls. Notably, vaeda outperforms other python-based methods for doublet annotation. Altogether, vaeda is a robust and competitive method for scRNA-seq doublet annotation and may be of particular interest in the context of python-based workflows.

AVAILABILITY AND IMPLEMENTATION

Vaeda is available at https://github.com/kostkalab/vaeda, and the version used for the results we present here is archived at zenodo (https://doi.org/10.5281/zenodo.7199783).

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

单细胞 RNA 测序（scRNA-seq）通过促进单细胞水平转录异质性的研究，不断扩展我们的知识。尽管这项技术在生物医学研究中具有实用性和成功性，但 scRNA-seq 数据中存在技术伪影。双细胞/多细胞是一种伪影，当两个或多个细胞被相同的条形码标记时就会发生这种情况，因此它们看起来像是一个单细胞。因为这引入了不存在的转录谱，所以双细胞会影响和误导下游分析。为了解决这个限制，需要计算方法来注释和去除 scRNA-seq 数据集中的双细胞。

结果

我们引入了 vaeda（用于双细胞注释的变分自动编码器），这是一种用于 scRNA-seq 数据中双细胞计算注释的新方法。Vaeda 集成了变分自动编码器和正无标签学习，以产生双细胞分数和二进制双细胞调用。我们将 vaeda 与七种现有的双细胞注释方法一起应用于 16 个基准数据集，发现 vaeda 在双细胞分数和双细胞调用方面表现具有竞争力。值得注意的是，vaeda 在双细胞注释方面优于其他基于 python 的方法。总之，vaeda 是一种用于 scRNA-seq 双细胞注释的强大且具有竞争力的方法，特别是在基于 python 的工作流程中可能会引起关注。

可用性和实现

Vaeda 可在 https://github.com/kostkalab/vaeda 获得，我们在此处呈现的结果所使用的版本已在 zenodo（https://doi.org/10.5281/zenodo.7199783）存档。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e16/9805559/bd9488f97a89/btac720f1.jpg

相似文献

Vaeda computationally annotates doublets in single-cell RNA sequencing data.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac720.

scds: computational annotation of doublets in single-cell RNA sequencing data.

Bioinformatics. 2020 Feb 15;36(4):1150-1158. doi: 10.1093/bioinformatics/btz698.

A machine learning-based method for automatically identifying novel cells in annotating single-cell RNA-seq data.

Bioinformatics. 2022 Oct 31;38(21):4885-4892. doi: 10.1093/bioinformatics/btac617.

scMRA: a robust deep learning method to annotate scRNA-seq data with multiple reference datasets.

Bioinformatics. 2022 Jan 12;38(3):738-745. doi: 10.1093/bioinformatics/btab700.

doubletD: detecting doublets in single-cell DNA sequencing data.

Bioinformatics. 2021 Jul 12;37(Suppl_1):i214-i221. doi: 10.1093/bioinformatics/btab266.

SoCube: an innovative end-to-end doublet detection algorithm for analyzing scRNA-seq data.

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad104.

scGCL: an imputation method for scRNA-seq data based on graph contrastive learning.

Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad098.

Continually adapting pre-trained language model to universal annotation of single-cell RNA-seq data.

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae047.

Random forest based similarity learning for single cell RNA sequencing data.

Bioinformatics. 2018 Jul 1;34(13):i79-i88. doi: 10.1093/bioinformatics/bty260.

CONICS integrates scRNA-seq with DNA sequencing to map gene expression to tumor sub-clones.

Bioinformatics. 2018 Sep 15;34(18):3217-3219. doi: 10.1093/bioinformatics/bty316.

引用本文的文献

A unified model-based framework for doublet or multiplet detection in single-cell multiomics data.

Nat Commun. 2024 Jul 2;15(1):5562. doi: 10.1038/s41467-024-49448-x.

本文引用的文献

Doublet identification in single-cell sequencing data using .

F1000Res. 2021 Sep 28;10:979. doi: 10.12688/f1000research.73600.2. eCollection 2021.

An analytical framework for interpretable and generalizable single-cell data analysis.

Nat Methods. 2021 Nov;18(11):1317-1321. doi: 10.1038/s41592-021-01286-1. Epub 2021 Nov 1.

EmptyNN: A neural network based on positive and unlabeled learning to remove cell-free droplets and recover lost cells in scRNA-seq data.

Patterns (N Y). 2021 Jul 20;2(8):100311. doi: 10.1016/j.patter.2021.100311. eCollection 2021 Aug 13.

Protocol for executing and benchmarking eight computational doublet-detection methods in single-cell RNA sequencing data analysis.

STAR Protoc. 2021 Jul 28;2(3):100699. doi: 10.1016/j.xpro.2021.100699. eCollection 2021 Sep 17.

mbkmeans: Fast clustering for single cell data using mini-batch k-means.

PLoS Comput Biol. 2021 Jan 26;17(1):e1008625. doi: 10.1371/journal.pcbi.1008625. eCollection 2021 Jan.

Benchmarking Computational Doublet-Detection Methods for Single-Cell RNA Sequencing Data.

Cell Syst. 2021 Feb 17;12(2):176-194.e6. doi: 10.1016/j.cels.2020.11.008. Epub 2020 Dec 17.

Solo: Doublet Identification in Single-Cell RNA-Seq via Semi-Supervised Deep Learning.

Cell Syst. 2020 Jul 22;11(1):95-101.e5. doi: 10.1016/j.cels.2020.05.010. Epub 2020 Jun 26.

scds: computational annotation of doublets in single-cell RNA sequencing data.

Bioinformatics. 2020 Feb 15;36(4):1150-1158. doi: 10.1093/bioinformatics/btz698.

MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices.

Nat Methods. 2019 Jul;16(7):619-626. doi: 10.1038/s41592-019-0433-8. Epub 2019 Jun 17.

Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data.

Cell Syst. 2019 Apr 24;8(4):281-291.e9. doi: 10.1016/j.cels.2018.11.005. Epub 2019 Apr 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Vaeda 对单细胞 RNA 测序数据中的二联体进行了计算注释。

Vaeda computationally annotates doublets in single-cell RNA sequencing data.

机构信息

Department of Developmental Biology, University of Pittsburgh, Pittsburgh, PA 15201, USA.

Canegie Mellon-University of Pittsburgh Joint PhD Program, University of Pittsburgh, Pittsburgh, PA 15201, USA.