College of Animal Science, South China Agricultural University/Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding/National Engineering Research Center for Breeding Swine Industry, Guangzhou 510642, China.
Yi Chuan. 2023 Apr 20;45(4):324-340. doi: 10.16288/j.yczz.22-385.
It has been reported that the aberrant DNA methylation may result in copy number variations (CNVs), and the CNVs may alter the levels of DNA methylation. Whole genome bisulfite sequencing (WGBS) is able to generate the sequencing data of DNAs, and shows the potential ability to detect CNVs. However, the evaluations and performances on the detections of CNVs using WGBS data is still unclear. In this study, five software with different strategies for CNV detections, e.g., BreakDancer, cn.mops, CNVnator, DELLY and Pindel, were selected to explore and benchmark the performances of CNV detections with WGBS data. Based on the real (2.62 billion reads) and simulated (12.35 billion reads) WGBS data of humans, we calculated the number, precision, recall, relative ability, memory usage, and running time of CNV detections by 150 times, and tried to figure out the optimal strategy for CNV detections with WGBS data. Based on the real WGBS data, Pindel detected the most deletions and duplications, CNVnator detected the deletions with the highest precision, cn.mops detected the duplications with the highest precision, Pindel detected the deletions with the highest recall, and cn.mops detected the duplications with the highest recall. Based on the simulated WGBS data, BreakDancer detected the most deletions, and cn.mops detected the most duplications. The CNVnator showed the highest precision and recall for both deletions and duplications. In real and simulated WGBS data, the ability of CNVnator to detect CNVs was likely to overtake that in the whole genome sequencing data. Additionally, DELLY and BreakDancer displayed the lowest peak of memory usage and the lest CPU runtime, while CNVnator expressed the highest peak of memory usage and the most CPU runtime. Taken together, CNVnator and cn.mops showed the excellent performances of CNV detections with WGBS data. These results suggested that it was feasible to detect CNVs using WGBS data, and provided the useful information to further investigate both CNVs and DNA methylation using WGBS data alone.
据报道,异常的 DNA 甲基化可能导致拷贝数变异 (CNVs),而 CNVs 可能改变 DNA 甲基化的水平。全基因组亚硫酸氢盐测序 (WGBS) 能够生成 DNA 的测序数据,并显示出检测 CNVs 的潜力。然而,使用 WGBS 数据检测 CNVs 的评估和性能仍不清楚。在这项研究中,选择了五种具有不同 CNV 检测策略的软件,如 BreakDancer、cn.mops、CNVnator、DELLY 和 Pindel,以探索和基准化使用 WGBS 数据检测 CNVs 的性能。基于人类的真实(26.2 亿读长)和模拟(123.5 亿读长)WGBS 数据,我们通过 150 次计算了 CNV 检测的数量、精度、召回率、相对能力、内存使用量和运行时间,并试图找出使用 WGBS 数据进行 CNV 检测的最佳策略。基于真实的 WGBS 数据,Pindel 检测到了最多的缺失和重复,CNVnator 检测到的缺失具有最高的精度,cn.mops 检测到的重复具有最高的精度,Pindel 检测到的缺失具有最高的召回率,而 cn.mops 检测到的重复具有最高的召回率。基于模拟的 WGBS 数据,BreakDancer 检测到了最多的缺失,而 cn.mops 检测到了最多的重复。CNVnator 对缺失和重复均表现出最高的精度和召回率。在真实和模拟的 WGBS 数据中,CNVnator 检测 CNVs 的能力可能超过全基因组测序数据。此外,DELLY 和 BreakDancer 表现出最低的内存使用峰值和最少的 CPU 运行时间,而 CNVnator 则表现出最高的内存使用峰值和最多的 CPU 运行时间。综上所述,CNVnator 和 cn.mops 对使用 WGBS 数据检测 CNVs 表现出了优异的性能。这些结果表明,使用 WGBS 数据检测 CNVs 是可行的,并为进一步使用 WGBS 数据单独研究 CNVs 和 DNA 甲基化提供了有用的信息。