Piro Vitor C, Faoro Helisson, Weiss Vinicius A, Steffens Maria B R, Pedrosa Fabio O, Souza Emanuel M, Raittz Roberto T
Laboratory of Bioinformatics, Professional and Technological Education Sector, Federal University of Paraná, Curitiba, PR, Brazil, Rua Dr, Alcides Vieira Arcoverde 1225, Curitiba, Paraná, Brazil.
BMC Res Notes. 2014 Jun 18;7:371. doi: 10.1186/1756-0500-7-371.
The fast reduction of prices of DNA sequencing allowed rapid accumulation of genome data. However, the process of obtaining complete genome sequences is still very time consuming and labor demanding. In addition, data produced from various sequencing technologies or alternative assemblies remain underexplored to improve assembly of incomplete genome sequences.
We have developed FGAP, a tool for closing gaps of draft genome sequences that takes advantage of different datasets. FGAP uses BLAST to align multiple contigs against a draft genome assembly aiming to find sequences that overlap gaps. The algorithm selects the best sequence to fill and eliminate the gap.
FGAP reduced the number of gaps by 78% in an E. coli draft genome assembly using two different sequencing technologies, Illumina and 454. Using PacBio long reads, 98% of gaps were solved. In human chromosome 14 assemblies, FGAP reduced the number of gaps by 35%. All the inserted sequences were validated with a reference genome using QUAST. The source code and a web tool are available at http://www.bioinfo.ufpr.br/fgap/.
DNA测序价格的快速下降使得基因组数据得以迅速积累。然而,获得完整基因组序列的过程仍然非常耗时且费力。此外,来自各种测序技术或替代组装产生的数据在改善不完整基因组序列的组装方面仍未得到充分探索。
我们开发了FGAP,这是一种利用不同数据集来填补基因组草图序列缺口的工具。FGAP使用BLAST将多个重叠群与基因组草图组装进行比对,旨在找到与缺口重叠的序列。该算法选择最佳序列来填补并消除缺口。
在使用Illumina和454这两种不同测序技术的大肠杆菌基因组草图组装中,FGAP将缺口数量减少了78%。使用PacBio长读长,98%的缺口得到了解决。在人类14号染色体组装中,FGAP将缺口数量减少了35%。所有插入序列均使用QUAST通过参考基因组进行了验证。源代码和网络工具可在http://www.bioinfo.ufpr.br/fgap/获取。