Suppr超能文献

利用单倍体DNA和新型组装策略解码火炬松的庞大基因组。

Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies.

作者信息

Neale David B, Wegrzyn Jill L, Stevens Kristian A, Zimin Aleksey V, Puiu Daniela, Crepeau Marc W, Cardeno Charis, Koriabine Maxim, Holtz-Morris Ann E, Liechty John D, Martínez-García Pedro J, Vasquez-Gross Hans A, Lin Brian Y, Zieve Jacob J, Dougherty William M, Fuentes-Soriano Sara, Wu Le-Shin, Gilbert Don, Marçais Guillaume, Roberts Michael, Holt Carson, Yandell Mark, Davis John M, Smith Katherine E, Dean Jeffrey F D, Lorenz W Walter, Whetten Ross W, Sederoff Ronald, Wheeler Nicholas, McGuire Patrick E, Main Doreen, Loopstra Carol A, Mockaitis Keithanne, deJong Pieter J, Yorke James A, Salzberg Steven L, Langley Charles H

出版信息

Genome Biol. 2014 Mar 4;15(3):R59. doi: 10.1186/gb-2014-15-3-r59.

Abstract

BACKGROUND

The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination.

RESULTS

We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome.

CONCLUSIONS

In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.

摘要

背景

到目前为止,针叶树基因组的大小和复杂性阻碍了全基因组测序和组装。火炬松(Pinus taeda L.)庞大的研究群体和重要的经济价值使其成为确定参考序列的早期候选对象。

结果

我们开发了一种新颖的策略来对火炬松基因组进行测序,该策略结合了松树生殖生物学的独特方面和基因组组装方法。我们采用全基因组鸟枪法,主要依赖于从一棵用于工业林木育种的火炬松(20-1010)的单个单倍体种子大配子体产生的下一代序列。所得的序列和组装结果用于生成一个跨度为23.2 Gbp、包含20.1 Gbp的基因组草图,N50支架大小为66.9 kbp,这使其相对于现有的针叶树基因组有了显著改进。长支架长度使得能够注释50,172个基因模型,内含子长度平均超过2.7 kbp,有时长度超过100 kbp。对直系同源基因集的分析确定了可能是针叶树特有的基因家族。我们基于对重复序列含量的从头分析进一步表征和扩展了现有的重复序列文库,估计其涵盖了82%的基因组。

结论

除了作为研究人员和育种者的资源具有价值外,本文报道的火炬松基因组序列和组装展示了一种用于对这一重要植物群体的大型复杂基因组进行测序的新颖方法,该方法现在可以广泛应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f924/4053751/f296db3fcc59/gb-2014-15-3-r59-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验