Suppr超能文献

使用过滤的间隔字匹配作为锚点,对远缘基因组序列进行精确的多重比对。

Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points.

机构信息

Department of Bioinformatics, Institute of Microbiology and Genetics.

Center for Computational Sciences, University of Goettingen, Goettingen, Germany.

出版信息

Bioinformatics. 2019 Jan 15;35(2):211-218. doi: 10.1093/bioinformatics/bty592.

Abstract

MOTIVATION

Most methods for pairwise and multiple genome alignment use fast local homology search tools to identify anchor points, i.e. high-scoring local alignments of the input sequences. Sequence segments between those anchor points are then aligned with slower, more sensitive methods. Finding suitable anchor points is therefore crucial for genome sequence comparison; speed and sensitivity of genome alignment depend on the underlying anchoring methods.

RESULTS

In this article, we use filtered spaced word matches to generate anchor points for genome alignment. For a given binary pattern representing match and don't-care positions, we first search for spaced-word matches, i.e. ungapped local pairwise alignments with matching nucleotides at the match positions of the pattern and possible mismatches at the don't-care positions. Those spaced-word matches that have similarity scores above some threshold value are then extended using a standard X-drop algorithm; the resulting local alignments are used as anchor points. To evaluate this approach, we used the popular multiple-genome-alignment pipeline Mugsy and replaced the exact word matches that Mugsy uses as anchor points with our spaced-word-based anchor points. For closely related genome sequences, the two anchoring procedures lead to multiple alignments of similar quality. For distantly related genomes, however, alignments calculated with our filtered-spaced-word matches are superior to alignments produced with the original Mugsy program where exact word matches are used to find anchor points.

AVAILABILITY AND IMPLEMENTATION

http://spacedanchor.gobics.de.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

大多数用于两两和多个基因组比对的方法使用快速局部同源搜索工具来识别锚点,即输入序列的高分局部比对。然后,在这些锚点之间的序列段使用较慢、更敏感的方法进行比对。因此,找到合适的锚点对于基因组序列比较至关重要;基因组比对的速度和灵敏度取决于基础的锚定方法。

结果

在本文中,我们使用过滤的间隔字匹配来生成基因组比对的锚点。对于表示匹配和不关心位置的二进制模式,我们首先搜索间隔字匹配,即具有匹配核苷酸的无间隙局部成对比对模式的匹配位置和可能的不关心位置的错配。那些相似度得分高于某个阈值的间隔字匹配然后使用标准的 X -drop 算法进行扩展;由此产生的局部比对用作锚点。为了评估这种方法,我们使用了流行的多基因组比对管道 Mugsy,并将 Mugsy 用作锚点的精确字匹配替换为我们基于间隔字的锚点。对于密切相关的基因组序列,这两种锚定过程导致相似质量的多重比对。然而,对于远距离相关的基因组,使用过滤间隔字匹配计算的比对优于使用原始 Mugsy 程序生成的比对,其中使用精确字匹配来找到锚点。

可用性和实现

http://spacedanchor.gobics.de。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5be5/6330006/d69e3e510be4/bty592f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验