Suppr超能文献

MOSAIK:一种基于哈希的算法,用于精确的下一代测序短读段比对。

MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.

作者信息

Lee Wan-Ping, Stromberg Michael P, Ward Alistair, Stewart Chip, Garrison Erik P, Marth Gabor T

机构信息

Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America.

Department of Biology, Boston College, Chestnut Hill, Massachusetts, United States of America; Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America.

出版信息

PLoS One. 2014 Mar 5;9(3):e90581. doi: 10.1371/journal.pone.0090581. eCollection 2014.

Abstract

MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).

摘要

MOSAIK是一个稳定、灵敏且开源的程序,用于将第二代和第三代测序读数映射到参考基因组。在当前的映射工具中,MOSAIK独树一帜,它能够比对所有主流测序技术产生的读数,包括Illumina、应用生物系统公司的SOLiD、罗氏454、离子激流以及太平洋生物科学公司的单分子实时(SMRT)测序技术。事实上,MOSAIK是在千人基因组计划中,唯一能为所有生成数据(测序技术、低覆盖度和外显子组)提供一致映射的比对器。为了提供高度准确的比对,MOSAIK采用了哈希聚类策略并结合史密斯-沃特曼算法。这种方法非常适合捕捉错配以及短插入和缺失。为了支持对更大结构变异(SV)发现日益增长的兴趣,MOSAIK为处理已知序列的SV提供了明确支持,例如移动元件插入(MEI),以及生成有助于SV发现的定制输出。所有变异发现都受益于对读数放置置信度的准确描述。为此,MOSAIK使用基于神经网络的训练方案来提供校准良好的映射质量分数,MOSAIK分配的质量分数与实际映射质量之间的相关系数大于0.98就证明了这一点。为了确保支持对任何基因组的研究,提供了一个训练流程,以确保对所研究的基因组获得最佳映射质量分数。MOSAIK是多线程的、开源的,并被纳入我们的命令和流程启动器系统GKNO(http://gkno.me)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/147b/3944147/abb8596e7f65/pone.0090581.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验