Suppr超能文献

demuxSNP:基于细胞哈希和 SNP 对单细胞 RNA 测序进行有监督的拆分流。

demuxSNP: supervised demultiplexing single-cell RNA sequencing using cell hashing and SNPs.

机构信息

School of Medicine, Limerick Digital Cancer Research Centre, Health Research Institute (HRI), University of Limerick, Limerick V94 T9PX, Ireland.

Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Boston, MA 02215, USA.

出版信息

Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae090.

Abstract

BACKGROUND

Multiplexing single-cell RNA sequencing experiments reduces sequencing cost and facilitates larger-scale studies. However, factors such as cell hashing quality and class size imbalance impact demultiplexing algorithm performance, reducing cost-effectiveness.

FINDINGS

We propose a supervised algorithm, demuxSNP, which leverages both cell hashing and genetic variation between individuals (single-nucletotide polymorphisms [SNPs]). demuxSNP addresses fundamental limitations in demultiplexing methods that use only one data modality. Some cells may be confidently demultiplexed using probabilistic hashing methods. demuxSNP uses these data to infer the genotype of singlet and doublet clusters and predict on cells assigned as negative, uncertain, or doublet using a nearest-neighbor approach adapted for missing data.We benchmarked demuxSNP against hashing, genotype-free SNP and hybrid methods on simulated and real data from renal cell cancer. demuxSNP outperformed standalone hashing methods on low-quality hashing data benchmark, improved overall classification accuracy, and allowed more high RNA quality cells to be recovered. Through varying simulated doublet rates, we showed that genotype-free SNP and hybrid methods that leverage them were impacted by class size imbalance and doublet rate. demuxSNP's supervised approach was more robust to doublet rate in experiments with class size imbalance.

CONCLUSIONS

demuxSNP uses hashing and SNP data to demultiplex datasets with low hashing quality where biological samples are genetically distinct. Unassigned or negative cells with high RNA quality are recovered, making more cells available for analysis. Data simulation and benchmarking pipelines as well as processed benchmarking data for 5-50% doublets are publicly available. demuxSNP is available as an R/Bioconductor package (https://doi.org/doi:10.18129/B9.bioc.demuxSNP).

摘要

背景

多重单细胞 RNA 测序实验可降低测序成本并促进更大规模的研究。然而,细胞哈希质量和类大小不平衡等因素会影响去复用算法的性能,降低成本效益。

发现

我们提出了一种监督算法 demuxSNP,该算法利用细胞哈希和个体之间的遗传变异(单核苷酸多态性 [SNP])。demuxSNP 解决了仅使用一种数据模态的去复用方法的基本局限性。一些细胞可以使用概率哈希方法进行有信心的去复用。demuxSNP 使用这些数据推断单聚体和二聚体簇的基因型,并使用适用于缺失数据的最近邻方法对分配为阴性、不确定或二聚体的细胞进行预测。我们在肾细胞癌的模拟和真实数据上,将 demuxSNP 与哈希、无基因型 SNP 和混合方法进行了基准测试。在低质量哈希数据基准测试中,demuxSNP 优于独立哈希方法,提高了整体分类准确性,并允许更多高 RNA 质量的细胞被回收。通过改变模拟的二聚体率,我们表明利用它们的无基因型 SNP 和混合方法受到类大小不平衡和二聚体率的影响。demuxSNP 的监督方法在类大小不平衡的实验中对二聚体率更稳健。

结论

demuxSNP 使用哈希和 SNP 数据来对具有低哈希质量的数据集进行去复用,其中生物样本在遗传上是不同的。具有高 RNA 质量的未分配或阴性细胞被回收,为分析提供了更多的细胞。数据模拟和基准测试管道以及用于 5-50%二聚体的处理基准测试数据均可公开获取。demuxSNP 可作为 R/Bioconductor 包获得(https://doi.org/doi:10.18129/B9.bioc.demuxSNP)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de14/11604057/d6e862fbcdcc/giae090fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验