Song Minfang, Ma Shuai, Wang Gong, Wang Yukun, Yang Zhenzhen, Xie Bin, Guo Tongkun, Huang Xingxu, Zhang Liye
Research Center for Life Sciences Computing, Zhejiang Lab, Kechuang Avenue, Zhongtai Sub-District, Yuhang District, Hangzhou, Zhejiang 311121, China.
School of Life Science and Technology, ShanghaiTech University, Haike Road, Pudong New District, Shanghai 201210, China.
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf076.
Copy number alterations (CNAs) are an important type of genomic variation which play a crucial role in the initiation and progression of cancer. With the explosion of single-cell RNA sequencing (scRNA-seq), several computational methods have been developed to infer CNAs from scRNA-seq studies. However, to date, no independent studies have comprehensively benchmarked their performance. Herein, we evaluated five state-of-the-art methods based on their performance in tumor versus normal cell classification; CNAs profile accuracy, tumor subclone inference, and aneuploidy identification in non-malignant cells. Our results showed that Numbat outperformed others across most evaluation criteria, while CopyKAT excelled in scenarios when expression matrix alone was used as input. In specific tasks, SCEVAN showed the best performance in clonal breakpoint detection and Numbat showed high sensitivity in copy number neutral LOH (cnLOH) detection. Additionally, we investigated how referencing settings, inclusion of tumor microenvironment cells, tumor type, and tumor purity impact the performance of these tools. This study provides a valuable guideline for researchers in selecting the appropriate methods for their datasets.
拷贝数改变(CNAs)是一种重要的基因组变异类型,在癌症的发生和发展中起着关键作用。随着单细胞RNA测序(scRNA-seq)技术的迅速发展,已经开发了几种计算方法来从scRNA-seq研究中推断CNAs。然而,迄今为止,尚无独立研究对它们的性能进行全面评估。在此,我们基于五种最先进的方法在肿瘤与正常细胞分类中的表现、CNAs图谱准确性、肿瘤亚克隆推断以及非恶性细胞中的非整倍体鉴定等方面对其进行了评估。我们的结果表明,在大多数评估标准下,Numbat的表现优于其他方法,而CopyKAT在仅使用表达矩阵作为输入的情况下表现出色。在特定任务中,SCEVAN在克隆断点检测中表现最佳,Numbat在拷贝数中性杂合性缺失(cnLOH)检测中具有高灵敏度。此外,我们还研究了参考设置、肿瘤微环境细胞的纳入、肿瘤类型和肿瘤纯度如何影响这些工具的性能。本研究为研究人员为其数据集选择合适的方法提供了有价值的指导。