van Baardwijk M N, Heijnen L S E M, Zhao H, Baudis M, Stubbs A P
Department of Pathology and Clinical Bioinformatics, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands; Department of Surgery, Division of HPB & Transplant Surgery, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands.
Department of Pathology and Clinical Bioinformatics, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands.
Genomics. 2024 Nov;116(6):110962. doi: 10.1016/j.ygeno.2024.110962. Epub 2024 Nov 14.
Copy Number Variations (CNVs) are crucial in various diseases, especially cancer, but detecting them accurately from SNP genotyping arrays remains challenging. Therefore, this study benchmarked five CNV detection tools-PennCNV, QuantiSNP, iPattern, EnsembleCNV, and R-GADA-using SNP array and WGS data from 2002 individuals of the DRAGEN re-analysis of the 1000 Genomes project. Results showed significant variability in tool performance. R-GADA had the highest recall but low precision, while PennCNV was the most reliable in terms of precision and F1 score. EnsembleCNV improved recall by combining multiple callers but increased false positives. Overall, current tools, including new methods, do not outperform PennCNV in precise CNV detection. Improved reference data and consensus on true positive CNV calls are necessary. This study provides valuable insights and scalable workflows for researchers selecting CNV detection methods in future studies.
拷贝数变异(CNV)在各种疾病中至关重要,尤其是在癌症中,但从单核苷酸多态性(SNP)基因分型阵列中准确检测它们仍然具有挑战性。因此,本研究使用千人基因组计划DRAGEN重新分析的2002名个体的SNP阵列和全基因组测序(WGS)数据,对五种CNV检测工具——PennCNV、QuantiSNP、iPattern、EnsembleCNV和R-GADA进行了基准测试。结果显示工具性能存在显著差异。R-GADA召回率最高但精度较低,而PennCNV在精度和F1分数方面最可靠。EnsembleCNV通过组合多个调用者提高了召回率,但增加了假阳性。总体而言,包括新方法在内的当前工具在精确CNV检测方面并不优于PennCNV。需要改进参考数据并就真正的阳性CNV调用达成共识。本研究为研究人员在未来研究中选择CNV检测方法提供了有价值的见解和可扩展的工作流程。