Gabrielaite Migle, Torp Mathias Husted, Rasmussen Malthe Sebro, Andreu-Sánchez Sergio, Vieira Filipe Garrett, Pedersen Christina Bligaard, Kinalis Savvas, Madsen Majbritt Busk, Kodama Miyako, Demircan Gül Sude, Simonyan Arman, Yde Christina Westmose, Olsen Lars Rønn, Marvig Rasmus L, Østrup Olga, Rossing Maria, Nielsen Finn Cilius, Winther Ole, Bagger Frederik Otzen
Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Blegdamsvej 9, 2100 Copenhagen, Denmark.
Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, Ørsteds Pl. 345C, 2800 Kgs. Lyngby, Denmark.
Cancers (Basel). 2021 Dec 14;13(24):6283. doi: 10.3390/cancers13246283.
Copy-number variations (CNVs) have important clinical implications for several diseases and cancers. Relevant CNVs are hard to detect because common structural variations define large parts of the human genome. CNV calling from short-read sequencing would allow single protocol full genomic profiling. We reviewed 50 popular CNV calling tools and included 11 tools for benchmarking in a reference cohort encompassing 39 whole genome sequencing (WGS) samples paired current clinical standard-SNP-array based CNV calling. Additionally, for nine samples we also performed whole exome sequencing (WES), to address the effect of sequencing protocol on CNV calling. Furthermore, we included Gold Standard reference sample NA12878, and tested 12 samples with CNVs confirmed by multiplex ligation-dependent probe amplification (MLPA). Tool performance varied greatly in the number of called CNVs and bias for CNV lengths. Some tools had near-perfect recall of CNVs from arrays for some samples, but poor precision. Several tools had better performance for NA12878, which could be a result of overfitting. We suggest combining the best tools also based on different methodologies: GATK gCNV, Lumpy, DELLY, and cn.MOPS. Reducing the total number of called variants could potentially be assisted by the use of background panels for filtering of frequently called variants.
拷贝数变异(CNV)对多种疾病和癌症具有重要的临床意义。由于常见的结构变异构成了人类基因组的大部分,相关的CNV很难检测到。基于短读长测序的CNV检测可实现单一方案的全基因组分析。我们评估了50种常用的CNV检测工具,并选取了11种工具在一个包含39个全基因组测序(WGS)样本的参考队列中进行基准测试,该队列将基于单核苷酸多态性阵列(SNP-array)的当前临床标准CNV检测作为对照。此外,对于9个样本,我们还进行了全外显子组测序(WES),以研究测序方案对CNV检测的影响。此外,我们纳入了金标准参考样本NA12878,并对12个经多重连接依赖探针扩增(MLPA)确认存在CNV的样本进行了测试。不同工具在检测到的CNV数量和CNV长度偏差方面表现差异很大。一些工具对某些样本阵列中的CNV召回率接近完美,但精度较差。有几种工具对NA12878的性能更好,这可能是过拟合的结果。我们建议根据不同方法组合最佳工具:GATK gCNV、Lumpy、DELLY和cn.MOPS。使用背景面板过滤频繁检测到的变异可能有助于减少检测到的变异总数。