Lysholm Fredrik, Andersson Björn, Persson Bengt
IFM Bioinformatics and SeRC (Swedish e-Science Research Centre), Linköping University, S-581 83 Linköping, Sweden.
BMC Res Notes. 2011 Oct 26;4:449. doi: 10.1186/1756-0500-4-449.
Roche 454 is one of the major 2nd generation sequencing platforms. The particular characteristics of 454 sequence data pose new challenges for bioinformatic analyses, e.g. assembly and alignment search algorithms. Simulation of these data is therefore useful, in order to further assess how bioinformatic applications and algorithms handle 454 data.
We developed a new application named 454sim for simulation of 454 data at high speed and accuracy. The program is multi-thread capable and is available as C++ source code or pre-compiled binaries. Sequence reads are simulated by 454sim using a set of statistical models for each chemistry. 454sim simulates recorded peak intensities, peak quality deterioration and it calculates quality values. All three generations of the Roche 454 chemistry ('GS20', 'GS FLX' and 'Titanium') are supported and defined in external text files for easy access and tweaking.
We present a new platform independent application named 454sim. 454sim is generally 200 times faster compared to previous programs and it allows for simple adjustments of the statistical models. These improvements make it possible to carry out more complex and rigorous algorithm evaluations in a reasonable time scale.
罗氏454是主要的第二代测序平台之一。454序列数据的特殊特性给生物信息学分析带来了新挑战,例如组装和比对搜索算法。因此,模拟这些数据有助于进一步评估生物信息学应用程序和算法如何处理454数据。
我们开发了一个名为454sim的新应用程序,用于高速、准确地模拟454数据。该程序支持多线程,可作为C++源代码或预编译二进制文件使用。454sim使用针对每种化学方法的一组统计模型来模拟序列读数。454sim模拟记录的峰强度、峰质量劣化并计算质量值。罗氏454化学方法的所有三代(“GS20”、“GS FLX”和“Titanium”)均得到支持,并在外部文本文件中定义,以便于访问和调整。
我们展示了一个名为454sim的新的独立于平台的应用程序。454sim通常比以前的程序快200倍,并且允许对统计模型进行简单调整。这些改进使得在合理的时间范围内进行更复杂、更严格的算法评估成为可能。