Quinn Nicole L, Levenkova Natasha, Chow William, Bouffard Pascal, Boroevich Keith A, Knight James R, Jarvie Thomas P, Lubieniecki Krzysztof P, Desany Brian A, Koop Ben F, Harkins Timothy T, Davidson William S
Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, Canada.
BMC Genomics. 2008 Aug 28;9:404. doi: 10.1186/1471-2164-9-404.
With a whole genome duplication event and wealth of biological data, salmonids are excellent model organisms for studying evolutionary processes, fates of duplicated genes and genetic and physiological processes associated with complex behavioral phenotypes. It is surprising therefore, that no salmonid genome has been sequenced. Atlantic salmon (Salmo salar) is a good representative salmonid for sequencing given its importance in aquaculture and the genomic resources available. However, the size and complexity of the genome combined with the lack of a sequenced reference genome from a closely related fish makes assembly challenging. Given the cost and time limitations of Sanger sequencing as well as recent improvements to next generation sequencing technologies, we examined the feasibility of using the Genome Sequencer (GS) FLX pyrosequencing system to obtain the sequence of a salmonid genome. Eight pooled BACs belonging to a minimum tiling path covering approximately 1 Mb of the Atlantic salmon genome were sequenced by GS FLX shotgun and Long Paired End sequencing and compared with a ninth BAC sequenced by Sanger sequencing of a shotgun library.
An initial assembly using only GS FLX shotgun sequences (average read length 248.5 bp) with approximately 30x coverage allowed gene identification, but was incomplete even when 126 Sanger-generated BAC-end sequences (approximately 0.09x coverage) were incorporated. The addition of paired end sequencing reads (additional approximately 26x coverage) produced a final assembly comprising 175 contigs assembled into four scaffolds with 171 gaps. Sanger sequencing of the ninth BAC (approximately 10.5x coverage) produced nine contigs and two scaffolds. The number of scaffolds produced by the GS FLX assembly was comparable to Sanger-generated sequencing; however, the number of gaps was much higher in the GS FLX assembly.
These results represent the first use of GS FLX paired end reads for de novo sequence assembly. Our data demonstrated that this improved the GS FLX assemblies; however, with respect to de novo sequencing of complex genomes, the GS FLX technology is limited to gene mining and establishing a set of ordered sequence contigs. Currently, for a salmonid reference sequence, it appears that a substantial portion of sequencing should be done using Sanger technology.
由于经历了全基因组复制事件且拥有丰富的生物学数据,鲑科鱼类是研究进化过程、重复基因的命运以及与复杂行为表型相关的遗传和生理过程的优秀模式生物。因此,令人惊讶的是,尚无鲑科鱼类基因组被测序。鉴于大西洋鲑(Salmo salar)在水产养殖中的重要性以及现有的基因组资源,它是进行测序的良好鲑科鱼类代表。然而,基因组的大小和复杂性,再加上缺乏来自近缘鱼类的已测序参考基因组,使得基因组组装具有挑战性。考虑到桑格测序的成本和时间限制以及新一代测序技术的最新改进,我们研究了使用基因组测序仪(GS)FLX焦磷酸测序系统获取鲑科鱼类基因组序列的可行性。通过GS FLX鸟枪法和长配对末端测序对属于最小拼接路径、覆盖约1 Mb大西洋鲑基因组的八个混合BAC进行了测序,并与通过鸟枪文库的桑格测序法测序的第九个BAC进行了比较。
仅使用GS FLX鸟枪序列(平均读长248.5 bp)进行初始组装,覆盖度约为30倍,可实现基因鉴定,但即使纳入126个桑格法生成的BAC末端序列(约0.09倍覆盖度),组装仍不完整。添加配对末端测序读段(额外约26倍覆盖度)后,最终组装得到175个重叠群,组装成四个支架,有171个缺口。对第九个BAC进行桑格测序(约10.5倍覆盖度)得到九个重叠群和两个支架。GS FLX组装产生的支架数量与桑格法测序相当;然而,GS FLX组装中的缺口数量要多得多。
这些结果代表了首次使用GS FLX配对末端读段进行从头序列组装。我们的数据表明,这改进了GS FLX组装;然而,对于复杂基因组的从头测序,GS FLX技术仅限于基因挖掘和建立一组有序的序列重叠群。目前,对于鲑科鱼类参考序列,似乎很大一部分测序工作应使用桑格技术完成。