Chen Xin, Fang Li Tai, Chen Zhong, Chen Wanqiu, Wu Hongjin, Zhu Bin, Moos Malcolm, Farmer Andrew, Zhang Xiaowen, Xiong Wei, Gong Shusheng, Jones Wendell, Mason Christopher E, Wu Shixiu, Xiao Chunlin, Wang Charles
Center for Genomics, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA.
Department of Basic Sciences, School of Medicine, Loma Linda University, Loma Linda, CA 92350, USA.
Precis Clin Med. 2025 Jun 4;8(2):pbaf011. doi: 10.1093/pcmedi/pbaf011. eCollection 2025 Jun.
Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful tool for cancer research, enabling in-depth characterization of tumor heterogeneity at the single-cell level. Recently, several scRNA-seq copy number variation (scCNV) inference methods have been developed, expanding the application of scRNA-seq to study genetic heterogeneity in cancer using transcriptomic data. However, the fidelity of these methods has not been investigated systematically.
We benchmarked five commonly used scCNV inference methods: HoneyBADGER, CopyKAT, CaSpER, inferCNV, and sciCNV. We evaluated their performance across four different scRNA-seq platforms using data from our previous multicenter study. We evaluated scCNV performance further using scRNA-seq datasets derived from mixed samples consisting of five human lung adenocarcinoma cell lines and also sequenced tissues from a small cell lung cancer patient and used the data to validate our findings with a clinical scRNA-seq dataset.
We found that the sensitivity and specificity of the five scCNV inference methods varied, depending on the selection of reference data, sequencing depth, and read length. CopyKAT and CaSpER outperformed other methods overall, while inferCNV, sciCNV, and CopyKAT performed better than other methods in subclone identification. We found that batch effects significantly affected the performance of subclone identification in mixed datasets in most methods we tested.
Our benchmarking study revealed the strengths and weaknesses of each of these scCNV inference methods and provided guidance for selecting the optimal CNV inference method using scRNA-seq data.
单细胞RNA测序(scRNA-seq)已成为癌症研究的有力工具,能够在单细胞水平上深入表征肿瘤异质性。最近,几种scRNA-seq拷贝数变异(scCNV)推断方法已经开发出来,将scRNA-seq的应用扩展到利用转录组数据研究癌症中的基因异质性。然而,这些方法的准确性尚未得到系统研究。
我们对五种常用的scCNV推断方法进行了基准测试:HoneyBADGER、CopyKAT、CaSpER、inferCNV和sciCNV。我们使用之前多中心研究的数据,在四个不同的scRNA-seq平台上评估了它们的性能。我们进一步使用来自由五种人肺腺癌细胞系组成的混合样本的scRNA-seq数据集评估scCNV性能,并对一名小细胞肺癌患者的组织进行测序,并用该数据通过临床scRNA-seq数据集验证我们的发现。
我们发现,五种scCNV推断方法的敏感性和特异性各不相同,这取决于参考数据的选择、测序深度和读长。总体而言,CopyKAT和CaSpER的表现优于其他方法,而inferCNV、sciCNV和CopyKAT在亚克隆识别方面比其他方法表现更好。我们发现,在我们测试的大多数方法中,批次效应显著影响了混合数据集中亚克隆识别的性能。
我们的基准测试研究揭示了每种scCNV推断方法的优缺点,并为使用scRNA-seq数据选择最佳CNV推断方法提供了指导。