TagDust2：一种从测序数据中提取读数的通用方法。 - Suppr | 超能文献

TagDust2：一种从测序数据中提取读数的通用方法。

TagDust2: a generic method to extract reads from sequencing data.

作者信息

Lassmann Timo

机构信息

RIKEN Center for Life Science Technologies (CLST), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Kanagawa, Japan.

Telethon Kids Institute, The University of Western Australia, 100 Roberts Road, Subiaco, Subiaco, 6008, Western Australia, Australia.

出版信息

BMC Bioinformatics. 2015 Jan 28;16:24. doi: 10.1186/s12859-015-0454-y.

DOI:10.1186/s12859-015-0454-y

PMID:25627334

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4384298/

Abstract

BACKGROUND

Arguably the most basic step in the analysis of next generation sequencing data (NGS) involves the extraction of mappable reads from the raw reads produced by sequencing instruments. The presence of barcodes, adaptors and artifacts subject to sequencing errors makes this step non-trivial.

RESULTS

Here I present TagDust2, a generic approach utilizing a library of hidden Markov models (HMM) to accurately extract reads from a wide array of possible read architectures. TagDust2 extracts more reads of higher quality compared to other approaches. Processing of multiplexed single, paired end and libraries containing unique molecular identifiers is fully supported. Two additional post processing steps are included to exclude known contaminants and filter out low complexity sequences. Finally, TagDust2 can automatically detect the library type of sequenced data from a predefined selection.

CONCLUSION

Taken together TagDust2 is a feature rich, flexible and adaptive solution to go from raw to mappable NGS reads in a single step. The ability to recognize and record the contents of raw reads will help to automate and demystify the initial, and often poorly documented, steps in NGS data analysis pipelines. TagDust2 is freely available at: http://tagdust.sourceforge.net .

摘要

背景

可以说，下一代测序数据（NGS）分析中最基本的步骤是从测序仪器产生的原始读数中提取可映射读数。条形码、接头以及易受测序错误影响的伪迹的存在使得这一步骤并非易事。

结果

在此，我介绍TagDust2，这是一种通用方法，利用隐马尔可夫模型（HMM）库从各种可能的读数结构中准确提取读数。与其他方法相比，TagDust2能提取更多高质量的读数。它完全支持对包含唯一分子标识符的多路复用单端、双端读数和文库的处理。还包括另外两个后处理步骤，以排除已知污染物并过滤掉低复杂度序列。最后，TagDust2可以从预定义的选项中自动检测测序数据的文库类型。

结论

总体而言，TagDust2是一个功能丰富、灵活且自适应的解决方案，能够一步从原始NGS读数转换为可映射读数。识别和记录原始读数内容的能力将有助于使NGS数据分析流程中最初且通常记录不完善的步骤自动化并揭开其神秘面纱。TagDust2可从以下网址免费获取：http://tagdust.sourceforge.net 。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

TagDust2：一种从测序数据中提取读数的通用方法。

TagDust2: a generic method to extract reads from sequencing data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

TagDust2：一种从测序数据中提取读数的通用方法。

TagDust2: a generic method to extract reads from sequencing data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献