Terzian Paul, Vandecasteele Céline, Lledo Joanna, Serre Rémy-Félix, Sabban Jules, Kuchly Claire, Pitel Frédérique, Leroux Sophie, Demars Julie, Iannuccelli Nathalie, Fève Katia, Bonnet Michèle, Gaspin Christine, Milan Denis, Iampietro Carole, Klopp Christophe, Donnadieu Cécile
Université Fédérale de Toulouse, INRAE, BioinfOmics, GenoToul Bioinformatics facility, 31326, Castanet-Tolosan, France.
INRAE, US 1426, GeT-PlaGe, Genotoul, France Génomique, Université Fédérale de Toulouse, Castanet-Tolosan, France.
Sci Data. 2025 Apr 1;12(1):556. doi: 10.1038/s41597-025-04769-4.
CpG methylation, a key epigenetic mark involved in gene regulation, development, and other biological processes, is commonly analyzed using Whole-Genome Bisulfite Sequencing (WGBS). However, bisulfite treatment causes significant DNA degradation. Enzymatic Methyl-seq (EM-seq) offers a short-read alternative that preserves DNA integrity but requires conversion steps, limiting its compatibility with downstream analyses. Third-generation sequencing technologies, such as Oxford Nanopore Technologies (ONT) and PacBio, enable direct detection of DNA modifications without altering the DNA, providing simultaneous genome and epigenome information. This work presents a comprehensive dataset combining long- and short-read sequencing data, including ONT, PacBio, Enzymatic Methyl-seq, and WGBS, for two agronomically relevant species: pig (Sus scrofa) and quail (Coturnix japonica). Data quality evaluation reveals high nucleotide quality scores for PacBio and short reads, robust alignment rates for long reads, and inter-method correlations in CpG methylation calling ranging from 0.76 to 0.99. This dataset is a valuable resource for training methylation callers and represents the first combined methylation dataset for these species, providing an essential benchmark for assessing emerging sequencing technologies.
CpG甲基化是一种参与基因调控、发育及其他生物学过程的关键表观遗传标记,通常使用全基因组亚硫酸氢盐测序(WGBS)进行分析。然而,亚硫酸氢盐处理会导致显著的DNA降解。酶促甲基化测序(EM-seq)提供了一种短读长的替代方法,可保留DNA完整性,但需要转换步骤,限制了其与下游分析的兼容性。第三代测序技术,如牛津纳米孔技术(ONT)和PacBio,能够在不改变DNA的情况下直接检测DNA修饰,同时提供基因组和表观基因组信息。这项工作展示了一个综合数据集,该数据集结合了长读长和短读长测序数据,包括ONT、PacBio、酶促甲基化测序和WGBS,涉及两种与农业相关的物种:猪(Sus scrofa)和鹌鹑(Coturnix japonica)。数据质量评估显示,PacBio和短读长的核苷酸质量得分较高,长读长的比对率稳健,CpG甲基化检测中的方法间相关性在0.76至0.99之间。该数据集是训练甲基化检测工具的宝贵资源,代表了这些物种的首个联合甲基化数据集,为评估新兴测序技术提供了重要基准。