Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany.
Genome Res. 2020 Oct;30(10):1449-1457. doi: 10.1101/gr.263863.120. Epub 2020 Sep 22.
Extensive manipulations involved in the preparation of DNA samples for sequencing have hitherto made it impossible to determine the precise structure of double-stranded DNA fragments being sequenced, such as the presence of blunt ends, single-stranded overhangs, or single-strand breaks. We here describe MatchSeq, a method that combines single-stranded DNA library preparation from diluted DNA samples with computational sequence matching, allowing the reconstruction of double-stranded DNA fragments on a single-molecule level. The application of MatchSeq to Neanderthal DNA, a particularly complex source of degraded DNA, reveals that 1- or 2-nt overhangs and blunt ends dominate the ends of ancient DNA molecules and that short gaps exist, which are predominantly caused by the loss of individual purines. We further show that deamination of cytosine to uracil occurs in both single- and double-stranded contexts close to the ends of molecules, and that single-stranded parts of DNA fragments are enriched in pyrimidines. MatchSeq provides unprecedented resolution for interrogating the structures of fragmented double-stranded DNA and can be applied to fragmented double-stranded DNA isolated from any biological source. The method relies on well-established laboratory techniques and can easily be integrated into routine data generation. This possibility is shown by the successful reconstruction of double-stranded DNA fragments from previously published single-stranded sequence data, allowing a more comprehensive characterization of the biochemical properties not only of ancient DNA but also of cell-free DNA from human blood plasma, a clinically relevant marker for the diagnosis and monitoring of disease.
在为测序准备 DNA 样本的过程中,需要进行广泛的操作,这使得迄今为止无法确定正在测序的双链 DNA 片段的精确结构,例如是否存在平头末端、单链突出、或单链断裂。我们在这里描述了 MatchSeq 方法,该方法将来自稀释 DNA 样本的单链 DNA 文库制备与计算序列匹配相结合,允许在单分子水平上重建双链 DNA 片段。将 MatchSeq 应用于尼安德特人 DNA,一种特别复杂的降解 DNA 来源,表明 1 或 2 个核苷酸的突出端和平头末端主导着古老 DNA 分子的末端,并且存在短缺口,这些缺口主要是由于单个嘌呤的丢失造成的。我们进一步表明,胞嘧啶在分子末端的单链和双链环境中都会脱氨形成尿嘧啶,并且 DNA 片段的单链部分富含嘧啶。MatchSeq 为研究碎片化双链 DNA 的结构提供了前所未有的分辨率,并且可以应用于从任何生物来源分离的碎片化双链 DNA。该方法依赖于成熟的实验室技术,并且可以轻松集成到常规数据生成中。这一可能性通过从以前发表的单链序列数据成功重建双链 DNA 片段得到了证明,这不仅允许对古老 DNA 的生化特性进行更全面的表征,还允许对来自人血浆的无细胞 DNA 进行更全面的表征,这是一种与疾病的诊断和监测相关的临床相关标志物。