Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305-5120, USA.
Bioinformatics. 2012 May 15;28(10):1324-7. doi: 10.1093/bioinformatics/bts123. Epub 2012 Mar 13.
Ultra-high-throughput sequencing produces duplicate and near-duplicate reads, which can consume computational resources in downstream applications. A tool that collapses such reads should reduce storage and assembly complications and costs.
We developed Fulcrum to collapse identical and near-identical Illumina and 454 reads (such as those from PCR clones) into single error-corrected sequences; it can process paired-end as well as single-end reads. Fulcrum is customizable and can be deployed on a single machine, a local network or a commercially available MapReduce cluster, and it has been optimized to maximize ease-of-use, cross-platform compatibility and future scalability. Sequence datasets have been collapsed by up to 71%, and the reduced number and improved quality of the resulting sequences allow assemblers to produce longer contigs while using less memory.
超高通量测序会产生重复和近似重复的读取,这会消耗下游应用程序的计算资源。能够合并这些读取的工具可以减少存储和组装的复杂性和成本。
我们开发了 Fulcrum 来将 Illumina 和 454 产生的相同和近似相同的读取(例如 PCR 克隆的读取)合并成单个纠错序列;它可以处理配对端和单端读取。Fulcrum 是可定制的,可以部署在单台机器、本地网络或商用的 MapReduce 集群上,并且已经针对易用性、跨平台兼容性和未来的可扩展性进行了优化。序列数据集的合并率最高可达 71%,并且减少的数量和提高的质量使得组装器可以在使用较少内存的情况下生成更长的连续序列。