Lien Annette, Legori Leonardo Pestana, Kraft Louis, Sackett Peter Wad, Renaud Gabriel
Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kongens Lyngby, Denmark.
University of Debrecen, Debrecen, Hungary.
Front Bioinform. 2023 Dec 7;3:1260486. doi: 10.3389/fbinf.2023.1260486. eCollection 2023.
Ancient DNA is highly degraded, resulting in very short sequences. Reads generated with modern high-throughput sequencing machines are generally longer than ancient DNA molecules, therefore the reads often contain some portion of the sequencing adaptors. It is crucial to remove those adaptors, as they can interfere with downstream analysis. Furthermore, overlapping portions when DNA has been read forward and backward (paired-end) can be merged to correct sequencing errors and improve read quality. Several tools have been developed for adapter trimming and read merging, however, no one has attempted to evaluate their accuracy and evaluate their potential impact on downstream analyses. Through the simulation of sequencing data, seven commonly used tools were analyzed in their ability to reconstruct ancient DNA sequences through read merging. The analyzed tools exhibit notable differences in their abilities to correct sequence errors and identify the correct read overlap, but the most substantial difference is observed in their ability to calculate quality scores for merged bases. Selecting the most appropriate tool for a given project depends on several factors, although some tools such as fastp have some shortcomings, whereas others like leeHom outperform the other tools in most aspects. While the choice of tool did not result in a measurable difference when analyzing population genetics using principal component analysis, it is important to note that downstream analyses that are sensitive to wrongly merged reads or that rely on quality scores can be significantly impacted by the choice of tool.
古DNA高度降解,导致序列非常短。用现代高通量测序机器生成的读段通常比古DNA分子长,因此读段常常包含一些测序接头部分。去除这些接头至关重要,因为它们会干扰下游分析。此外,当DNA正向和反向读取(双端)时的重叠部分可以合并,以纠正测序错误并提高读段质量。已经开发了几种用于接头修剪和读段合并的工具,然而,没有人尝试评估它们的准确性以及评估它们对下游分析的潜在影响。通过模拟测序数据,分析了七种常用工具通过读段合并重建古DNA序列的能力。所分析的工具在纠正序列错误和识别正确读段重叠的能力上表现出显著差异,但在计算合并碱基质量分数的能力上观察到最大的差异。为给定项目选择最合适的工具取决于几个因素,尽管一些工具如fastp有一些缺点,而其他工具如leeHom在大多数方面优于其他工具。虽然在使用主成分分析进行群体遗传学分析时,工具的选择没有导致可测量的差异,但重要的是要注意,对错误合并的读段敏感或依赖质量分数的下游分析可能会受到工具选择的显著影响。