Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA.
Sci Rep. 2019 Jul 17;9(1):10357. doi: 10.1038/s41598-019-45938-x.
VCF2CNA is a tool (Linux commandline or web-interface) for copy-number alteration (CNA) analysis and tumor purity estimation of paired tumor-normal VCF variant file formats. It operates on whole genome and whole exome datasets. To benchmark its performance, we applied it to 46 adult glioblastoma and 146 pediatric neuroblastoma samples sequenced by Illumina and Complete Genomics (CGI) platforms respectively. VCF2CNA was highly consistent with a state-of-the-art algorithm using raw sequencing data (mean F1-score = 0.994) in high-quality whole genome glioblastoma samples and was robust to uneven coverage introduced by library artifacts. In the whole genome neuroblastoma set, VCF2CNA identified MYCN high-level amplifications in 31 of 32 clinically validated samples compared to 15 found by CGI's HMM-based CNA model. Moreover, VCF2CNA achieved highly consistent CNA profiles between WGS and WXS platforms (mean F1 score 0.97 on a set of 15 rhabdomyosarcoma samples). In addition, VCF2CNA provides accurate tumor purity estimates for samples with sufficient CNAs. These results suggest that VCF2CNA is an accurate, efficient and platform-independent tool for CNA and tumor purity analyses without accessing raw sequence data.
VCF2CNA 是一个用于拷贝数改变(CNA)分析和配对肿瘤-正常 VCF 变异文件格式肿瘤纯度估计的工具(Linux 命令行或网络界面)。它可用于全基因组和全外显子组数据集。为了对其性能进行基准测试,我们分别将其应用于 46 例成人胶质母细胞瘤和 146 例小儿神经母细胞瘤样本,这些样本由 Illumina 和 Complete Genomics (CGI) 平台测序。在高质量的全基因组胶质母细胞瘤样本中,VCF2CNA 与使用原始测序数据的最先进算法高度一致(平均 F1 评分=0.994),并且对文库伪影引起的不均匀覆盖具有鲁棒性。在全基因组神经母细胞瘤组中,VCF2CNA 在 32 例经临床验证的样本中鉴定出 31 例 MYCN 高水平扩增,而 CGI 的基于 HMM 的 CNA 模型仅发现 15 例。此外,VCF2CNA 在 WGS 和 WXS 平台之间实现了高度一致的 CNA 谱(在一组 15 例横纹肌肉瘤样本上的平均 F1 得分为 0.97)。此外,VCF2CNA 可为具有足够 CNA 的样本提供准确的肿瘤纯度估计。这些结果表明,VCF2CNA 是一种准确、高效且与平台无关的 CNA 和肿瘤纯度分析工具,无需访问原始序列数据。