Department of Molecular Oncology, British Columbia Cancer Research Centre, Vancouver, British Columbia, Canada.
Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
Nat Methods. 2019 Oct;16(10):1007-1015. doi: 10.1038/s41592-019-0529-1. Epub 2019 Sep 9.
Single-cell RNA sequencing has enabled the decomposition of complex tissues into functionally distinct cell types. Often, investigators wish to assign cells to cell types through unsupervised clustering followed by manual annotation or via 'mapping' to existing data. However, manual interpretation scales poorly to large datasets, mapping approaches require purified or pre-annotated data and both are prone to batch effects. To overcome these issues, we present CellAssign, a probabilistic model that leverages prior knowledge of cell-type marker genes to annotate single-cell RNA sequencing data into predefined or de novo cell types. CellAssign automates the process of assigning cells in a highly scalable manner across large datasets while controlling for batch and sample effects. We demonstrate the advantages of CellAssign through extensive simulations and analysis of tumor microenvironment composition in high-grade serous ovarian cancer and follicular lymphoma.
单细胞 RNA 测序技术使复杂组织分解为具有不同功能的细胞类型成为可能。通常,研究人员希望通过无监督聚类将细胞分配到细胞类型,然后进行手动注释,或者通过“映射”到现有数据进行分配。然而,手动解释在大数据集上的扩展性很差,映射方法需要纯化或预注释的数据,而且两者都容易受到批次效应的影响。为了克服这些问题,我们提出了 CellAssign,这是一种概率模型,利用细胞类型标记基因的先验知识将单细胞 RNA 测序数据注释为预定义的或新的细胞类型。CellAssign 通过高度可扩展的方式在大型数据集上自动分配细胞,同时控制批次和样本效应。我们通过对高级别浆液性卵巢癌和滤泡性淋巴瘤的肿瘤微环境组成进行广泛的模拟和分析,展示了 CellAssign 的优势。