Moore Michael J, Dhingra Amit, Soltis Pamela S, Shaw Regina, Farmerie William G, Folta Kevin M, Soltis Douglas E
Department of Botany, University of Florida, Gainesville, FL 32611, USA.
BMC Plant Biol. 2006 Aug 25;6:17. doi: 10.1186/1471-2229-6-17.
Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae).
More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6x in Nandina and 17.3x in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with approximately 60% of all errors associated with homopolymer runs of 5 or more nucleotides and approximately 50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions.
Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the GS 20 plastid genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically.
质体基因组序列信息对植物生物学的多个学科至关重要,包括系统发育学和分子生物学。在过去五年中,完全测序的质体基因组数量急剧增加,这主要得益于传统桑格测序技术的进步。在此,我们报告通过成功使用新的焦磷酸测序平台——基因组测序仪20(GS 20)系统(454生命科学公司),质体基因组测序的时间和成本进一步显著降低,该平台能够快速、准确地对基部真双子叶被子植物南天竹(小檗科)和美国梧桐(悬铃木科)的整个质体基因组进行测序。
在两次GS 20序列运行过程中,同时获得了每个质体基因组超过99.75%的序列,南天竹的平均覆盖深度为24.6倍,美国梧桐为17.3倍。南天竹和美国梧桐的质体基因组具有基本相同的基因互补情况,并拥有典型的被子植物质体结构和基因排列。为评估GS 20序列的准确性,使用传统测序为每个基因组生成了超过45千碱基的序列。南天竹和美国梧桐的GS 20序列总体错误率分别为0.043%和0.031%。所有观察到的错误中,超过97%与同聚物片段有关,约60%的错误与5个或更多核苷酸的同聚物片段有关,约50%的错误与长同聚物片段区域有关。两个基因组中均未出现替换错误。相对于反向重复序列和编码区,两个质体基因组的单拷贝和非编码区的错误率通常更高。
使用GS 20系统获得了南天竹和美国梧桐质体基因组的高精度且基本完整的序列信息。更重要的是,GS 20质体基因组序列中观察到的高精度是在时间和成本相较于传统鸟枪法基因组测序技术显著降低的情况下实现的,尽管覆盖度约为先前报道的GS 20从头基因组序列的一半。GS 20应广泛适用于被子植物质体基因组测序,因此有望极大地扩展植物遗传和系统发育研究的规模。