Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
HIM-BGI Omics Center, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences (CAS), Hangzhou, China.
BMC Bioinformatics. 2024 Jun 5;25(1):206. doi: 10.1186/s12859-024-05821-7.
Bisulfite sequencing (BS-Seq) is a fundamental technique for characterizing DNA methylation profiles. Genotype calling from bisulfite-converted BS-Seq data allows allele-specific methylation analysis and the concurrent exploration of genetic and epigenetic profiles. Despite various methods have been proposed, single nucleotide polymorphisms (SNPs) calling from BS-Seq data, particularly for SNPs on chromosome X and in the presence of contaminative data, poses ongoing challenges.
We introduce bsgenova, a novel SNP caller tailored for bisulfite sequencing data, employing a Bayesian multinomial model. The performance of bsgenova is assessed by comparing SNPs called from real-world BS-Seq data with those from corresponding whole-genome sequencing (WGS) data across three human cell lines. bsgenova is both sensitive and precise, especially for chromosome X, compared with three existing methods. Moreover, in the presence of low-quality reads, bsgenova outperforms other methods notably. In addition, bsgenova is meticulously implemented, leveraging matrix imputation and multi-process parallelization. Compared to existing methods, bsgenova stands out for its speed and efficiency in memory and disk usage. Furthermore, bsgenova integrates bsextractor, a methylation extractor, enhancing its flexibility and expanding its utility.
We introduce bsgenova for SNP calling from bisulfite-sequencing data. The source code is available at https://github.com/hippo-yf/bsgenova under license GPL-3.0.
亚硫酸氢盐测序(BS-Seq)是一种用于描述 DNA 甲基化谱的基本技术。从经亚硫酸氢盐转化的 BS-Seq 数据中进行基因型调用允许等位基因特异性甲基化分析,并同时探索遗传和表观遗传谱。尽管已经提出了各种方法,但从 BS-Seq 数据中调用单核苷酸多态性(SNP),特别是在存在污染数据的情况下,仍然存在挑战。
我们引入了 bsgenova,这是一种针对亚硫酸氢盐测序数据的新型 SNP 调用器,采用贝叶斯多项式模型。通过将从三个人类细胞系的真实 BS-Seq 数据中调用的 SNP 与来自相应全基因组测序(WGS)数据的 SNP 进行比较,评估了 bsgenova 的性能。与三种现有方法相比,bsgenova 具有较高的敏感性和准确性,特别是对于染色体 X。此外,在存在低质量读取的情况下,bsgenova 的表现明显优于其他方法。此外,bsgenova 经过精心实现,利用矩阵插补和多进程并行化。与现有方法相比,bsgenova 具有速度快、内存和磁盘使用效率高的特点。此外,bsgenova 集成了 bsextractor,这是一种甲基化提取器,增强了其灵活性并扩展了其用途。
我们引入了 bsgenova 用于从亚硫酸氢盐测序数据中调用 SNP。源代码可在 https://github.com/hippo-yf/bsgenova 上获得,许可证为 GPL-3.0。