Laboratory of Pharmaceutical Biotechnology, Ghent University, Harelbekestraat 72, 9000 Ghent, Belgium.
Nucleic Acids Res. 2012 Feb;40(3):e24. doi: 10.1093/nar/gkr1000. Epub 2011 Nov 29.
Standard Illumina mate-paired libraries are constructed from 3- to 5-kb DNA fragments by a blunt-end circularization. Sequencing reads that pass through the junction of the two joined ends of a 3-5-kb DNA fragment are not easy to identify and pose problems during mapping and de novo assembly. Longer read lengths increase the possibility that a read will cross the junction. To solve this problem, we developed a mate-paired protocol for use with Illumina sequencing technology that uses Cre-Lox recombination instead of blunt end circularization. In this method, a LoxP sequence is incorporated at the junction site. This sequence allows screening reads for junctions without using a reference genome. Junction reads can be trimmed or split at the junction. Moreover, the location of the LoxP sequence in the reads distinguishes mate-paired reads from spurious paired-end reads. We tested this new method by preparing and sequencing a mate-paired library with an insert size of 3 kb from Saccharomyces cerevisiae. We present an analysis of the library quality statistics and a new bio-informatics tool called DeLoxer that can be used to analyze an IlluminaCre-Lox mate-paired data set. We also demonstrate how the resulting data significantly improves a de novo assembly of the S. cerevisiae genome.
标准的 Illumina 配对末端文库是通过平端环化从 3-5kb 的 DNA 片段构建的。通过连接两个 3-5kb DNA 片段的末端的测序读段不容易识别,并且在映射和从头组装过程中会产生问题。较长的读长增加了读段穿过连接点的可能性。为了解决这个问题,我们开发了一种用于 Illumina 测序技术的配对末端协议,该协议使用 Cre-Lox 重组而不是平端环化。在这种方法中,在连接点处掺入 LoxP 序列。该序列允许在不使用参考基因组的情况下筛选连接点的读段。连接读段可以在连接点处进行修剪或拆分。此外,读取中 LoxP 序列的位置可将配对末端读段与虚假配对末端读段区分开来。我们通过用酿酒酵母制备并测序插入大小为 3kb 的配对末端文库来测试这种新方法。我们提出了对文库质量统计数据的分析以及一种新的生物信息学工具 DeLoxer,该工具可用于分析 IlluminaCre-Lox 配对末端数据集。我们还展示了如何使用生成的数据显著提高酿酒酵母基因组的从头组装。