Zhao Shanghui, Xu Dantong, Cai Jiali, Shen Qingpeng, He Mingran, Pan Xiangchun, Gao Yahui, Li Jiaqi, Yuan Xiaolong
State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Laboratory of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong 510642, China.
National Center of Technology Innovation for Pigs, Chongqing 402460, China.
Comput Struct Biotechnol J. 2025 Mar 6;27:912-919. doi: 10.1016/j.csbj.2025.02.040. eCollection 2025.
It's important to dissect the relationship between copy number variations (CNVs) and DNA methylation, because both greatly change the dosages of genes and are responsible for diverse human cancers. Although whole genome bisulfite sequencing (WGBS) informs CNVs and DNA methylation, no study has provided a systematic benchmark for detecting CNVs from WGBS data. Herein, based on simulated and real WGBS datasets of 84.62 billion reads, we undertook 714 CNV detections to comprehensively benchmark the performance of 35 strategies, 5 alignment algorithms (bismarkbt2, bsbolt, bsmap, bwameth, and walt) wrapping with 7 CNV detection applications (BreakDancer, cn.mops, CNVkit, CNVnator, DELLY, GASV and Pindel). The results highlighted a subset of strategies that accurately called CNVs depending on numbers, lengths, precision, recall, and F1 scores of CNV detections. We found that bwameth-DELLY and bwameth-BreakDancer were the best strategies for calling deletions, and walt-CNVnator and bismarkbt2-CNVnator were the best strategies for calling duplications. These works provided investigators with useful information to accurately explore CNVs from WGBS data in humans.
剖析拷贝数变异(CNV)与DNA甲基化之间的关系非常重要,因为二者都会极大地改变基因剂量,并与多种人类癌症相关。尽管全基因组亚硫酸氢盐测序(WGBS)可提供CNV和DNA甲基化信息,但尚无研究为从WGBS数据中检测CNV提供系统的基准。在此,基于846.2亿条读数的模拟和真实WGBS数据集,我们进行了714次CNV检测,以全面评估35种策略、5种比对算法(bismarkbt2、bsbolt、bsmap、bwameth和walt)与7种CNV检测应用程序(BreakDancer、cn.mops、CNVkit、CNVnator、DELLY、GASV和Pindel)组合的性能。结果突出显示了一部分策略,这些策略根据CNV检测的数量、长度、精度、召回率和F1分数准确地识别出CNV。我们发现,bwameth-DELLY和bwameth-BreakDancer是检测缺失的最佳策略,而walt-CNVnator和bismarkbt2-CNVnator是检测重复的最佳策略。这些工作为研究人员提供了有用的信息,以便他们从人类WGBS数据中准确地探索CNV。