ecSeq Bioinformatics GmbH, Sternwartenstraße 29, 04103, Saxony, Germany.
Institut für Informatik, Universität Leipzig, Härtelstraße 16-18, 04107, Saxony, Germany.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab021.
Whole genome bisulfite sequencing is currently at the forefront of epigenetic analysis, facilitating the nucleotide-level resolution of 5-methylcytosine (5mC) on a genome-wide scale. Specialized software have been developed to accommodate the unique difficulties in aligning such sequencing reads to a given reference, building on the knowledge acquired from model organisms such as human, or Arabidopsis thaliana. As the field of epigenetics expands its purview to non-model plant species, new challenges arise which bring into question the suitability of previously established tools. Herein, nine short-read aligners are evaluated: Bismark, BS-Seeker2, BSMAP, BWA-meth, ERNE-BS5, GEM3, GSNAP, Last and segemehl. Precision-recall of simulated alignments, in comparison to real sequencing data obtained from three natural accessions, reveals on-balance that BWA-meth and BSMAP are able to make the best use of the data during mapping. The influence of difficult-to-map regions, characterized by deviations in sequencing depth over repeat annotations, is evaluated in terms of the mean absolute deviation of the resulting methylation calls in comparison to a realistic methylome. Downstream methylation analysis is responsive to the handling of multi-mapping reads relative to mapping quality (MAPQ), and potentially susceptible to bias arising from the increased sequence complexity of densely methylated reads.
全基因组亚硫酸氢盐测序目前处于表观遗传学分析的前沿,能够在全基因组范围内实现核苷酸水平上的 5- 甲基胞嘧啶(5mC)分辨率。已经开发了专门的软件来适应将此类测序reads 与给定参考序列对齐的独特困难,这些软件的建立基于从人类或拟南芥等模式生物中获得的知识。随着表观遗传学领域将其研究范围扩大到非模式植物物种,新的挑战出现了,这使得以前建立的工具的适用性受到质疑。在此,评估了九种短读序列比对器:Bismark、BS-Seeker2、BSMAP、BWA-meth、ERNE-BS5、GEM3、GSNAP、Last 和 segemehl。与从三个自然品系获得的真实测序数据相比,模拟比对的精度-召回率表明,BWA-meth 和 BSMAP 在映射过程中能够最好地利用数据。通过比较实际甲基组,以测序深度相对于重复注释的偏差为特征的难以映射区域的影响,以得到的甲基化调用的平均绝对偏差来评估。下游甲基化分析对相对于映射质量(MAPQ)的多映射读取的处理敏感,并且可能容易受到高度甲基化读取的序列复杂性增加引起的偏差的影响。