The Centre for Applied Aquatic Genomics, Chinese Academy of Fishery Sciences, Beijing 100141, China.
BMC Genomics. 2013 Sep 8;14:604. doi: 10.1186/1471-2164-14-604.
Generation of large mate-pair libraries is necessary for de novo genome assembly but the procedure is complex and time-consuming. Furthermore, in some complex genomes, it is hard to increase the N50 length even with large mate-pair libraries, which leads to low transcript coverage. Thus, it is necessary to develop other simple scaffolding approaches, to at least solve the elongation of transcribed fragments.
We describe L_RNA_scaffolder, a novel genome scaffolding method that uses long transcriptome reads to order, orient and combine genomic fragments into larger sequences. To demonstrate the accuracy of the method, the zebrafish genome was scaffolded. With expanded human transcriptome data, the N50 of human genome was doubled and L_RNA_scaffolder out-performed most scaffolding results by existing scaffolders which employ mate-pair libraries. In these two examples, the transcript coverage was almost complete, especially for long transcripts. We applied L_RNA_scaffolder to the highly polymorphic pearl oyster draft genome and the gene model length significantly increased.
The simplicity and high-throughput of RNA-seq data makes this approach suitable for genome scaffolding. L_RNA_scaffolder is available at http://www.fishbrowser.org/software/L_RNA_scaffolder.
从头组装基因组需要生成大型的 mate-pair 文库,但该过程复杂且耗时。此外,在某些复杂基因组中,即使使用大型 mate-pair 文库,也很难增加 N50 长度,这导致转录物覆盖率低。因此,有必要开发其他简单的支架方法,至少要解决转录片段的延伸问题。
我们描述了 L_RNA_scaffolder,这是一种新颖的基因组支架方法,它使用长转录组读数来对基因组片段进行排序、定向和组合成更大的序列。为了证明该方法的准确性,我们对斑马鱼基因组进行了支架构建。利用扩展的人类转录组数据,人类基因组的 N50 长度增加了一倍,并且 L_RNA_scaffolder 的性能优于大多数使用 mate-pair 文库的现有支架方法。在这两个示例中,转录物的覆盖率几乎是完整的,尤其是对于长转录物。我们将 L_RNA_scaffolder 应用于高度多态性的珍珠牡蛎草图基因组,基因模型长度显著增加。
RNA-seq 数据的简单性和高通量使得这种方法适合于基因组支架构建。L_RNA_scaffolder 可在 http://www.fishbrowser.org/software/L_RNA_scaffolder 上获得。