The Genome Institute, Washington University School of Medicine, 4444 Forest Park Avenue, St Louis, MO 63108, USA.
Bioinformatics. 2012 Jan 1;28(1):13-6. doi: 10.1093/bioinformatics/btr588. Epub 2011 Oct 23.
No individual assembly algorithm addresses all the known limitations of assembling short-length sequences. Overall reduced sequence contig length is the major problem that challenges the usage of these assemblies. We describe an algorithm to take advantages of different assembly algorithms or sequencing platforms to improve the quality of next-generation sequence (NGS) assemblies.
The algorithm is implemented as a graph accordance assembly (GAA) program. The algorithm constructs an accordance graph to capture the mapping information between the target and query assemblies. Based on the accordance graph, the contigs or scaffolds of the target assembly can be extended, merged or bridged together. Extra constraints, including gap sizes, mate pairs, scaffold order and orientation, are explored to enforce those accordance operations in the correct context. We applied GAA to various chicken NGS assemblies and the results demonstrate improved contiguity statistics and higher genome and gene coverage.
GAA is implemented in OO perl and is available here: http://sourceforge.net/projects/gaa-wugi/.
没有任何一种单一的组装算法能够解决所有已知的短序列组装限制。整体上序列片段的长度减少是主要问题,这限制了这些组装方法的使用。我们描述了一种算法,可以利用不同的组装算法或测序平台来提高下一代测序(NGS)组装的质量。
该算法被实现为一个图谱一致性组装(GAA)程序。该算法构建一个一致性图谱,以捕获目标和查询组装之间的映射信息。基于该一致性图谱,可以扩展、合并或桥接目标组装的 contigs 或 scaffolds。额外的约束条件,包括缺口大小、mate pairs、scaffold 顺序和方向,都被探索用来在正确的上下文中执行这些一致性操作。我们将 GAA 应用于各种鸡的 NGS 组装中,结果表明改进了连续性统计和更高的基因组和基因覆盖率。
GAA 是用面向对象的 perl 实现的,可以在这里获得:http://sourceforge.net/projects/gaa-wugi/。