Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, South Carolina 29208, USA.
Department of Environmental Health Science, Arnold School of Public Health, University of South Carolina, Columbia, South Carolina 29208, USA.
Genome Res. 2024 Feb 7;34(1):85-93. doi: 10.1101/gr.278098.123.
The availability of single-cell sequencing (SCS) enables us to assess intra-tumor heterogeneity and identify cellular subclones without the confounding effect of mixed cells. Copy number aberrations (CNAs) have been commonly used to identify subclones in SCS data using various clustering methods, as cells comprising a subpopulation are found to share a genetic profile. However, currently available methods may generate spurious results (e.g., falsely identified variants) in the procedure of CNA detection, thereby diminishing the accuracy of subclone identification within a large, complex cell population. In this study, we developed a subclone clustering method based on a fused lasso model, referred to as FLCNA, which can simultaneously detect CNAs in single-cell DNA sequencing (scDNA-seq) data. Spike-in simulations were conducted to evaluate the clustering and CNA detection performance of FLCNA, benchmarking it against existing copy number estimation methods (SCOPE, HMMcopy) in combination with commonly used clustering methods. Application of FLCNA to a scDNA-seq data set of breast cancer revealed different genomic variation patterns in neoadjuvant chemotherapy-treated samples and pretreated samples. We show that FLCNA is a practical and powerful method for subclone identification and CNA detection with scDNA-seq data.
单细胞测序(SCS)的出现使我们能够评估肿瘤内异质性,并在没有混合细胞混杂效应的情况下识别细胞亚克隆。拷贝数异常(CNAs)已被广泛用于使用各种聚类方法从 SCS 数据中识别亚克隆,因为构成亚群的细胞被发现具有遗传特征。然而,目前可用的方法在 CNA 检测过程中可能会产生虚假结果(例如,错误识别的变体),从而降低在大型复杂细胞群体中识别亚克隆的准确性。在这项研究中,我们开发了一种基于融合lasso 模型的亚克隆聚类方法,称为 FLCNA,它可以同时检测单细胞 DNA 测序(scDNA-seq)数据中的 CNAs。通过 Spike-in 模拟来评估 FLCNA 的聚类和 CNA 检测性能,并将其与现有的拷贝数估计方法(SCOPE、HMMcopy)结合常用的聚类方法进行基准测试。将 FLCNA 应用于乳腺癌的 scDNA-seq 数据集,揭示了新辅助化疗治疗样本和预处理样本中不同的基因组变异模式。我们表明,FLCNA 是一种实用且强大的方法,可用于 scDNA-seq 数据中的亚克隆识别和 CNA 检测。