Australian National University, Research School of Biology, Division of Ecology and Evolution, Acton, Canberra, ACT, 2600, Australia.
CSIRO Land and Water, Integrated Omics Team, Black Mountain Laboratories, Canberra, ACT, 2600, Australia.
BMC Genomics. 2020 Feb 28;21(1):188. doi: 10.1186/s12864-020-6594-0.
Next generation sequencing (NGS) can recover DNA data from valuable extant and extinct museum specimens. However, archived or preserved DNA is difficult to sequence because of its fragmented, damaged nature, such that the most successful NGS methods for preserved specimens remain sub-optimal. Improving wet-lab protocols and comprehensively determining the effects of sample age on NGS library quality are therefore of vital importance. Here, I examine the relationship between sample age and several indicators of library quality following targeted NGS sequencing of ~ 1300 loci using 271 samples of pinned moth specimens (Helicoverpa armigera) ranging in age from 5 to 117 years.
I find that older samples have lower DNA concentrations following extraction and thus require a higher number of indexing PCR cycles during library preparation. When sequenced reads are aligned to a reference genome or to only the targeted region, older samples have a lower number of sequenced and mapped reads, lower mean coverage, and lower estimated library sizes, while the percentage of adapters in sequenced reads increases significantly as samples become older. Older samples also show the poorest capture success, with lower enrichment and a higher improved coverage anticipated from further sequencing.
Sample age has significant, measurable impacts on the quality of NGS data following targeted enrichment. However, incorporating a uracil-removing enzyme into the blunt end-repair step during library preparation could help to repair DNA damage, and using a method that prevents adapter-dimer formation may result in improved data yields.
下一代测序(NGS)可以从有价值的现存和已灭绝的博物馆标本中恢复 DNA 数据。然而,由于其碎片化和受损的性质,存档或保存的 DNA 很难进行测序,因此保存标本的最成功的 NGS 方法仍然不尽如人意。因此,改进湿实验室方案并全面确定样本年龄对 NGS 文库质量的影响至关重要。在这里,我研究了 271 个针插蛾标本(Helicoverpa armigera)的样本年龄与文库质量的几个指标之间的关系,这些样本的年龄从 5 年到 117 年不等,通过靶向 NGS 测序约 1300 个基因座。
我发现,较老的样本在提取后 DNA 浓度较低,因此在文库制备过程中需要更多的索引 PCR 循环。当测序reads 与参考基因组或仅靶向区域对齐时,较老的样本具有较少的测序和映射 reads、较低的平均覆盖率和较低的估计文库大小,而测序 reads 中的适配器比例随着样本的老化而显著增加。较老的样本也显示出最差的捕获成功率,进一步测序时富集效率较低,预计覆盖率提高幅度较高。
样本年龄对靶向富集后 NGS 数据的质量有显著的、可衡量的影响。然而,在文库制备的平端修复步骤中加入尿嘧啶去除酶可以帮助修复 DNA 损伤,并且使用可以防止适配器二聚体形成的方法可能会提高数据产量。