Suppr超能文献

全局、高度特异且快速的比对种子过滤。

Global, highly specific and fast filtering of alignment seeds.

机构信息

Institute for Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Str. 47, 17489, Greifswald, Germany.

Center for Functional Genomics of Microbes, University of Greifswald, Felix-Hausdorff-Str. 8, 17489, Greifswald, Germany.

出版信息

BMC Bioinformatics. 2022 Jun 10;23(1):225. doi: 10.1186/s12859-022-04745-4.

Abstract

BACKGROUND

An important initial phase of arguably most homology search and alignment methods such as required for genome alignments is seed finding. The seed finding step is crucial to curb the runtime as potential alignments are restricted to and anchored at the sequence position pairs that constitute the seed. To identify seeds, it is good practice to use sets of spaced seed patterns, a method that locally compares two sequences and requires exact matches at certain positions only.

RESULTS

We introduce a new method for filtering alignment seeds that we call geometric hashing. Geometric hashing achieves a high specificity by combining non-local information from different seeds using a simple hash function that only requires a constant and small amount of additional time per spaced seed. Geometric hashing was tested on the task of finding homologous positions in the coding regions of human and mouse genome sequences. Thereby, the number of false positives was decreased about million-fold over sets of spaced seeds while maintaining a very high sensitivity.

CONCLUSIONS

An additional geometric hashing filtering phase could improve the run-time, accuracy or both of programs for various homology-search-and-align tasks.

摘要

背景

可以说,对于大多数同源搜索和比对方法(如基因组比对所需的方法)而言,种子发现是一个重要的初始阶段。种子发现步骤对于控制运行时间至关重要,因为潜在的比对被限制在构成种子的序列位置对上并锚定在这些位置对上。为了找到种子,使用间隔种子模式集是一种很好的实践方法,这种方法在局部比较两个序列,只需要在某些位置进行精确匹配。

结果

我们引入了一种新的过滤比对种子的方法,称为几何哈希(geometric hashing)。几何哈希通过使用简单的哈希函数组合来自不同种子的非局部信息来实现高特异性,该哈希函数只需要常数和少量额外的时间来处理每个间隔种子。几何哈希在寻找人类和小鼠基因组序列编码区中同源位置的任务上进行了测试。通过这种方法,与间隔种子集相比,假阳性的数量减少了约百万倍,同时保持了非常高的灵敏度。

结论

对于各种同源搜索和比对任务的程序,可以增加一个额外的几何哈希过滤阶段来提高运行时间、准确性或两者兼而有之。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/489c/9188137/f9d9acc82eda/12859_2022_4745_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验