Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA.
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad014.
Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not keep up with the rapid production of new genomes and quickly growing protein databases.
Here, we describe miniprot, a new aligner for mapping protein sequences to a complete genome. Miniprot integrates recent techniques such as k-mer sketch and vectorized dynamic programming. It is tens of times faster than existing tools while achieving comparable accuracy on real data.
蛋白质与基因组比对对于注释非模式生物的基因至关重要。尽管有一些用于此目的的工具,但它们都是在 10 多年前开发的,没有结合最新的对齐算法进展。它们效率低下,无法跟上新基因组的快速产生和快速增长的蛋白质数据库的步伐。
在这里,我们描述了 miniprot,这是一种用于将蛋白质序列映射到完整基因组的新对齐器。Miniprot 集成了最近的技术,如 k-mer 草图和矢量化动态编程。它比现有的工具快数十倍,而在真实数据上实现了相当的准确性。