Suppr超能文献

评估单细胞 DNA 测序数据中细胞聚类方法的性能。

Assessing the performance of methods for cell clustering from single-cell DNA sequencing data.

机构信息

Department of Computer Science, Florida State University, Tallahassee, Florida, United States of America.

出版信息

PLoS Comput Biol. 2023 Oct 12;19(10):e1010480. doi: 10.1371/journal.pcbi.1010480. eCollection 2023 Oct.

Abstract

BACKGROUND

Many cancer genomes have been known to contain more than one subclone inside one tumor, the phenomenon of which is called intra-tumor heterogeneity (ITH). Characterizing ITH is essential in designing treatment plans, prognosis as well as the study of cancer progression. Single-cell DNA sequencing (scDNAseq) has been proven effective in deciphering ITH. Cells corresponding to each subclone are supposed to carry a unique set of mutations such as single nucleotide variations (SNV). While there have been many studies on the cancer evolutionary tree reconstruction, not many have been proposed that simply characterize the subclonality without tree reconstruction. While tree reconstruction is important in the study of cancer evolutionary history, typically they are computationally expensive in terms of running time and memory consumption due to the huge search space of the tree structure. On the other hand, subclonality characterization of single cells can be converted into a cell clustering problem, the dimension of which is much smaller, and the turnaround time is much shorter. Despite the existence of a few state-of-the-art cell clustering computational tools for scDNAseq, there lacks a comprehensive and objective comparison under different settings.

RESULTS

In this paper, we evaluated six state-of-the-art cell clustering tools-SCG, BnpC, SCClone, RobustClone, SCITE and SBMClone-on simulated data sets given a variety of parameter settings and a real data set. We designed a simulator specifically for cell clustering, and compared these methods' performances in terms of their clustering accuracy, specificity and sensitivity and running time. For SBMClone, we specifically designed an ultra-low coverage large data set to evaluate its performance in the face of an extremely high missing rate.

CONCLUSION

From the benchmark study, we conclude that BnpC and SCG's clustering accuracy are the highest and comparable to each other. However, BnpC is more advantageous in terms of running time when cell number is high (> 1500). It also has a higher clustering accuracy than SCG when cluster number is high (> 16). SCClone's accuracy in estimating the number of clusters is the highest. RobustClone and SCITE's clustering accuracy are the lowest for all experiments. SCITE tends to over-estimate the cluster number and has a low specificity, whereas RobustClone tends to under-estimate the cluster number and has a much lower sensitivity than other methods. SBMClone produced reasonably good clustering (V-measure > 0.9) when coverage is > = 0.03 and thus is highly recommended for ultra-low coverage large scDNAseq data sets.

摘要

背景

许多癌症基因组中都存在一个肿瘤内的多个亚克隆,这种现象被称为肿瘤内异质性(ITH)。对 ITH 进行特征描述对于设计治疗方案、预后以及癌症进展研究至关重要。单细胞 DNA 测序(scDNAseq)已被证明在破译 ITH 方面非常有效。每个亚克隆对应的细胞应该携带一组独特的突变,如单核苷酸变异(SNV)。虽然已经有很多关于癌症进化树重建的研究,但很少有研究简单地描述亚克隆性而不进行树重建。虽然树重建在癌症进化史研究中很重要,但由于树结构的搜索空间巨大,它们通常在运行时间和内存消耗方面计算成本很高。另一方面,单细胞的亚克隆性特征可以转化为细胞聚类问题,其维度要小得多,周转时间也短得多。尽管存在一些用于 scDNAseq 的最先进的细胞聚类计算工具,但在不同的设置下缺乏全面和客观的比较。

结果

在本文中,我们在各种参数设置和真实数据集上评估了六种最先进的细胞聚类工具-SCG、BnpC、SCClone、RobustClone、SCITE 和 SBMClone。我们专门设计了一个用于细胞聚类的模拟器,并根据聚类准确性、特异性和敏感性以及运行时间来比较这些方法的性能。对于 SBMClone,我们专门设计了一个超低覆盖度的大数据集来评估它在极高缺失率下的性能。

结论

从基准研究中,我们得出结论,BnpC 和 SCG 的聚类准确性最高,彼此相当。然而,当细胞数量较高(>1500)时,BnpC 在运行时间方面更具优势。当簇数较高(>16)时,BnpC 的聚类准确性也高于 SCG。SCClone 估计簇数的准确性最高。在所有实验中,RobustClone 和 SCITE 的聚类准确性最低。SCITE 倾向于高估簇数,特异性较低,而 RobustClone 倾向于低估簇数,敏感性比其他方法低得多。当覆盖率> = 0.03 时,SBMClone 产生了相当好的聚类(V-度量>0.9),因此强烈推荐用于超低覆盖度的大型 scDNAseq 数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1eae/10597505/48af70548316/pcbi.1010480.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验