Lassmann Timo
RIKEN Center for Life Science Technologies (CLST), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Kanagawa, Japan.
Telethon Kids Institute, The University of Western Australia, 100 Roberts Road, Subiaco, Subiaco, 6008, Western Australia, Australia.
BMC Bioinformatics. 2015 Jan 28;16:24. doi: 10.1186/s12859-015-0454-y.
Arguably the most basic step in the analysis of next generation sequencing data (NGS) involves the extraction of mappable reads from the raw reads produced by sequencing instruments. The presence of barcodes, adaptors and artifacts subject to sequencing errors makes this step non-trivial.
Here I present TagDust2, a generic approach utilizing a library of hidden Markov models (HMM) to accurately extract reads from a wide array of possible read architectures. TagDust2 extracts more reads of higher quality compared to other approaches. Processing of multiplexed single, paired end and libraries containing unique molecular identifiers is fully supported. Two additional post processing steps are included to exclude known contaminants and filter out low complexity sequences. Finally, TagDust2 can automatically detect the library type of sequenced data from a predefined selection.
Taken together TagDust2 is a feature rich, flexible and adaptive solution to go from raw to mappable NGS reads in a single step. The ability to recognize and record the contents of raw reads will help to automate and demystify the initial, and often poorly documented, steps in NGS data analysis pipelines. TagDust2 is freely available at: http://tagdust.sourceforge.net .
可以说,下一代测序数据(NGS)分析中最基本的步骤是从测序仪器产生的原始读数中提取可映射读数。条形码、接头以及易受测序错误影响的伪迹的存在使得这一步骤并非易事。
在此,我介绍TagDust2,这是一种通用方法,利用隐马尔可夫模型(HMM)库从各种可能的读数结构中准确提取读数。与其他方法相比,TagDust2能提取更多高质量的读数。它完全支持对包含唯一分子标识符的多路复用单端、双端读数和文库的处理。还包括另外两个后处理步骤,以排除已知污染物并过滤掉低复杂度序列。最后,TagDust2可以从预定义的选项中自动检测测序数据的文库类型。
总体而言,TagDust2是一个功能丰富、灵活且自适应的解决方案,能够一步从原始NGS读数转换为可映射读数。识别和记录原始读数内容的能力将有助于使NGS数据分析流程中最初且通常记录不完善的步骤自动化并揭开其神秘面纱。TagDust2可从以下网址免费获取:http://tagdust.sourceforge.net 。