Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA.
Genome Res. 2012 Nov;22(11):2241-9. doi: 10.1101/gr.138925.112. Epub 2012 Jul 16.
Eliminating the bacterial cloning step has been a major factor in the vastly improved efficiency of massively parallel sequencing approaches. However, this also has made it a technical challenge to produce the modern equivalent of the Fosmid- or BAC-end sequences that were crucial for assembling and analyzing complex genomes during the Sanger-based sequencing era. To close this technology gap, we developed Fosill, a method for converting Fosmids to Illumina-compatible jumping libraries. We constructed Fosmid libraries in vectors with Illumina primer sequences and specific nicking sites flanking the cloning site. Our family of pFosill vectors allows multiplex Fosmid cloning of end-tagged genomic fragments without physical size selection and is compatible with standard and multiplex paired-end Illumina sequencing. To excise the bulk of each cloned insert, we introduced two nicks in the vector, translated them into the inserts, and cleaved them. Recircularization of the vector via coligation of insert termini followed by inverse PCR generates a jumping library for paired-end sequencing with 101-base reads. The yield of unique Fosmid-sized jumps is sufficiently high, and the background of short, incorrectly spaced and chimeric artifacts sufficiently low, to enable applications such as mapping of structural variation and scaffolding of de novo assemblies. We demonstrate the power of Fosill to map genome rearrangements in a cancer cell line and identified three fusion genes that were corroborated by RNA-seq data. Our Fosill-powered assembly of the mouse genome has an N50 scaffold length of 17.0 Mb, rivaling the connectivity (16.9 Mb) of the Sanger-sequencing based draft assembly.
消除细菌克隆步骤是大大提高大规模平行测序方法效率的主要因素。然而,这也使得产生现代类似 Fosmid 或 BAC 末端序列成为一项技术挑战,而这些序列在基于 Sanger 的测序时代对于组装和分析复杂基因组至关重要。为了弥补这一技术差距,我们开发了 Fosill 方法,用于将 Fosmid 转化为与 Illumina 兼容的跳跃文库。我们在带有 Illumina 引物序列的载体中构建了 Fosmid 文库,并在克隆位点两侧设计了特定的缺口。我们的 pFosill 载体家族允许对末端标记的基因组片段进行多重 Fosmid 克隆,而无需物理尺寸选择,并且与标准和多重配对末端 Illumina 测序兼容。为了切除每个克隆插入物的大部分,我们在载体中引入了两个缺口,将它们翻译成插入物,并将其切割。通过插入末端的 coligation 和反向 PCR 重新环化载体,生成用于配对末端测序的跳跃文库,读取长度为 101 个碱基。独特的 Fosmid 大小跳跃的产量足够高,而短的、不正确间隔和嵌合的人工制品的背景足够低,足以实现结构变异作图和从头组装支架等应用。我们展示了 Fosill 在癌细胞系中作图基因组重排的能力,并鉴定了三个融合基因,这些基因得到了 RNA-seq 数据的证实。我们使用 Fosill 组装的小鼠基因组的 N50 支架长度为 17.0 Mb,与基于 Sanger 测序的草图组装的连接度(16.9 Mb)相媲美。