Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Hannoversche Str. 28, 10115 Berlin, Germany.
Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Charitéplatz 1, 10117 Berlin, Germany.
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae512.
Single-cell RNA sequencing (scRNA-seq) data are widely used to study cancer cell states and their heterogeneity. However, the tumour microenvironment is usually a mixture of healthy and cancerous cells and it can be difficult to fully separate these two populations based on transcriptomics alone. If available, somatic single-nucleotide variants (SNVs) observed in the scRNA-seq data could be used to identify the cancer population and match that information with the single cells' expression profile. However, calling somatic SNVs in scRNA-seq data is a challenging task, as most variants seen in the short-read data are not somatic, but can instead be germline variants, RNA edits or transcription, sequencing, or processing errors. In addition, only variants present in actively transcribed regions for each individual cell will be seen in the data.
To address these challenges, we develop CCLONE (Cancer Cell Labelling On Noisy Expression), an interpretable tool adapted to handle the uncertainty and sparsity of SNVs called from scRNA-seq data. CCLONE jointly identifies cancer clonal populations, and their associated variants. We apply CCLONE on two acute myeloid leukaemia datasets and one lung adenocarcinoma dataset and show that CCLONE captures both genetic clones and somatic events for multiple patients. These results show how CCLONE can be used to gather insight into the course of the disease and the origin of cancer cells in scRNA-seq data.
Source code is available at github.com/HaghverdiLab/CCLONE.
单细胞 RNA 测序 (scRNA-seq) 数据被广泛用于研究癌细胞状态及其异质性。然而,肿瘤微环境通常是健康细胞和癌细胞的混合物,仅基于转录组学很难完全分离这两种细胞群体。如果有可用的话,在 scRNA-seq 数据中观察到的体细胞单核苷酸变异 (SNV) 可用于识别癌细胞群体,并将该信息与单细胞的表达谱相匹配。然而,在 scRNA-seq 数据中调用体细胞 SNV 是一项具有挑战性的任务,因为在短读数据中看到的大多数变体都不是体细胞变体,而是种系变体、RNA 编辑或转录、测序或处理错误。此外,只有在每个细胞的活跃转录区域中存在的变体才会在数据中显示出来。
为了解决这些挑战,我们开发了 CCLONE(基于噪声表达的癌细胞标记),这是一种可解释的工具,适用于处理从 scRNA-seq 数据中调用的 SNV 的不确定性和稀疏性。CCLONE 联合识别癌细胞克隆群体及其相关变体。我们将 CCLONE 应用于两个急性髓系白血病数据集和一个肺腺癌数据集,并表明 CCLONE 可以捕获多个患者的遗传克隆和体细胞事件。这些结果展示了 CCLONE 如何用于深入了解 scRNA-seq 数据中疾病的进程和癌细胞的起源。
源代码可在 github.com/HaghverdiLab/CCLONE 获得。