Institute for Physical Sciences and Technology, University of Maryland, College Park, Maryland 20742.
Genetics. 2014 Mar;196(3):875-90. doi: 10.1534/genetics.113.159715.
Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer "super-reads," rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.
针叶树是主要的裸子植物。它们基因组的大小和复杂性对全基因组鸟枪法测序和组装提出了严峻的技术挑战。我们采用了新的策略,得以确定火炬松(Pinus taeda)的参考基因组序列,这是迄今为止组装的最大基因组。大多数序列数据来自单个大配子体的全基因组鸟枪法测序,这是单个松树种子的单倍体组织。虽然这限制了可用 DNA 的数量,但产生的单倍体序列数据非常适合组装。单倍体序列通过来自亲本二倍体 DNA 的多个连接长片段配对物对文库进行了补充。对于最长的片段,我们使用了新型 fosmid DiTag 文库。从连接文库中与大配子体不匹配的序列被识别并去除。通过将大量的成对末端读数压缩成更小的“超级读数”集,从而可以对重叠式组装算法进行计算,这有助于组装序列数据。为了进一步提高基因组序列的连续性和生物学实用性,实施了利用独立基因组和转录组组装的其他支架方法。这些策略的结合产生了一个 201.5 亿碱基对的草图基因组序列,N50 支架大小为 66.9 kbp。