单细胞标签解复用方法的系统基准测试揭示了基于聚类方法的强大性能。

Systematic benchmark of single-cell hashtag demultiplexing approaches reveals robust performance of a clustering-based method.

作者信息

Sayed Mohammed, Wang Yue Julia, Lim Hee-Woong

机构信息

Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Ave. Cincinnati OH 45229, United States.

Department of Biomedical Sciences, College of Medicine, Florida State University, 1115 W Call St, Tallahassee, FL 32306, United States.

出版信息

Brief Funct Genomics. 2025 Jan 15;24. doi: 10.1093/bfgp/elae039.

DOI:10.1093/bfgp/elae039

PMID:39387404

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11735735/

Abstract

Single-cell technology opened up a new avenue to delineate cellular status at a single-cell resolution and has become an essential tool for studying human diseases. Multiplexing allows cost-effective experiments by combining multiple samples and effectively mitigates batch effects. It starts by giving each sample a unique tag and then pooling them together for library preparation and sequencing. After sequencing, sample demultiplexing is performed based on tag detection, where cells belonging to one sample are expected to have a higher amount of the corresponding tag than cells from other samples. However, in reality, demultiplexing is not straightforward due to the noise and contamination from various sources. Successful demultiplexing depends on the efficient removal of such contamination. Here, we perform a systematic benchmark combining different normalization methods and demultiplexing approaches using real-world data and simulated datasets. We show that accounting for sequencing depth variability increases the separability between tagged and untagged cells, and the clustering-based approach outperforms existing tools. The clustering-based workflow is available as an R package from https://github.com/hwlim/hashDemux.

摘要

单细胞技术开辟了一条以单细胞分辨率描绘细胞状态的新途径，并已成为研究人类疾病的重要工具。多重分析通过组合多个样本实现了具有成本效益的实验，并有效减轻了批次效应。它首先给每个样本一个独特的标签，然后将它们汇集在一起进行文库制备和测序。测序后，基于标签检测进行样本解复用，属于一个样本的细胞预期比来自其他样本的细胞具有更高量的相应标签。然而，在现实中，由于各种来源的噪声和污染，解复用并非易事。成功的解复用取决于有效去除此类污染。在这里，我们使用真实世界数据和模拟数据集，对不同的归一化方法和解复用方法进行了系统的基准测试。我们表明，考虑测序深度变异性可增加标记细胞和未标记细胞之间的可分离性，并且基于聚类的方法优于现有工具。基于聚类的工作流程可作为一个R包从https://github.com/hwlim/hashDemux获得。