Suppr超能文献

用 Miniprot 进行蛋白质到基因组的比对。

Protein-to-genome alignment with miniprot.

机构信息

Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA.

Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA.

出版信息

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad014.

Abstract

MOTIVATION

Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not keep up with the rapid production of new genomes and quickly growing protein databases.

RESULTS

Here, we describe miniprot, a new aligner for mapping protein sequences to a complete genome. Miniprot integrates recent techniques such as k-mer sketch and vectorized dynamic programming. It is tens of times faster than existing tools while achieving comparable accuracy on real data.

AVAILABILITY AND IMPLEMENTATION

https://github.com/lh3/miniport.

摘要

动机

蛋白质与基因组比对对于注释非模式生物的基因至关重要。尽管有一些用于此目的的工具,但它们都是在 10 多年前开发的,没有结合最新的对齐算法进展。它们效率低下,无法跟上新基因组的快速产生和快速增长的蛋白质数据库的步伐。

结果

在这里,我们描述了 miniprot,这是一种用于将蛋白质序列映射到完整基因组的新对齐器。Miniprot 集成了最近的技术,如 k-mer 草图和矢量化动态编程。它比现有的工具快数十倍,而在真实数据上实现了相当的准确性。

可用性和实现

https://github.com/lh3/miniport。

相似文献

1
Protein-to-genome alignment with miniprot.
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad014.
3
Maligner: a fast ordered restriction map aligner.
Bioinformatics. 2016 Apr 1;32(7):1016-22. doi: 10.1093/bioinformatics/btv711. Epub 2015 Dec 3.
4
GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping.
Bioinformatics. 2017 Nov 1;33(21):3355-3363. doi: 10.1093/bioinformatics/btx342.
5
Shouji: a fast and efficient pre-alignment filter for sequence alignment.
Bioinformatics. 2019 Nov 1;35(21):4255-4263. doi: 10.1093/bioinformatics/btz234.
6
Minimap2: pairwise alignment for nucleotide sequences.
Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.
7
Fast gap-affine pairwise alignment using the wavefront algorithm.
Bioinformatics. 2021 May 1;37(4):456-463. doi: 10.1093/bioinformatics/btaa777.
8
Efficient multiple genome alignment.
Bioinformatics. 2002;18 Suppl 1:S312-20. doi: 10.1093/bioinformatics/18.suppl_1.s312.
9
KCMBT: a k-mer Counter based on Multiple Burst Trees.
Bioinformatics. 2016 Sep 15;32(18):2783-90. doi: 10.1093/bioinformatics/btw345. Epub 2016 Jun 9.
10
ARYANA: Aligning Reads by Yet Another Approach.
BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S12. doi: 10.1186/1471-2105-15-S9-S12. Epub 2014 Sep 10.

引用本文的文献

1
EASYstrata: an all-in-one workflow for genome annotation and genomic divergence analysis.
NAR Genom Bioinform. 2025 Aug 27;7(3):lqaf110. doi: 10.1093/nargab/lqaf110. eCollection 2025 Sep.
3
Evolutionary Genomics of Gene Families: A Case Study of Insect Gustatory Receptors.
Methods Mol Biol. 2025;2935:179-209. doi: 10.1007/978-1-0716-4583-3_8.
4
Adaptive loss of shortwave-sensitive opsins during cartilaginous fish evolution.
Nat Commun. 2025 Aug 18;16(1):7684. doi: 10.1038/s41467-025-62544-w.
8
Universal orthologs infer deep phylogenies and improve genome quality assessments.
BMC Biol. 2025 Jul 28;23(1):224. doi: 10.1186/s12915-025-02328-2.
9
Comparative Genomic Assessment of the Cupriavidus necator Species for One-Carbon Based Biomanufacturing.
Microb Biotechnol. 2025 Jul;18(7):e70201. doi: 10.1111/1751-7915.70201.
10
Chromosome-level genome assembly of the Vermilion Snapper (Rhomboplites aurorubens).
Sci Data. 2025 Jul 23;12(1):1281. doi: 10.1038/s41597-025-05573-w.

本文引用的文献

1
Haplotype-resolved assembly of diploid genomes without parental data.
Nat Biotechnol. 2022 Sep;40(9):1332-1335. doi: 10.1038/s41587-022-01261-x. Epub 2022 Mar 24.
2
Technology dictates algorithms: recent developments in read alignment.
Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7.
4
Towards complete and error-free genome assemblies of all vertebrate species.
Nature. 2021 Apr;592(7856):737-746. doi: 10.1038/s41586-021-03451-0. Epub 2021 Apr 28.
5
BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database.
NAR Genom Bioinform. 2021 Jan 6;3(1):lqaa108. doi: 10.1093/nargab/lqaa108. eCollection 2021 Mar.
6
Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm.
Nat Methods. 2021 Feb;18(2):170-175. doi: 10.1038/s41592-020-01056-5. Epub 2021 Feb 1.
7
Liftoff: accurate mapping of gene annotations.
Bioinformatics. 2021 Jul 19;37(12):1639-1643. doi: 10.1093/bioinformatics/btaa1016.
8
HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.
Genome Res. 2020 Sep;30(9):1291-1305. doi: 10.1101/gr.263566.120. Epub 2020 Aug 14.
9
GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins.
NAR Genom Bioinform. 2020 Jun;2(2):lqaa026. doi: 10.1093/nargab/lqaa026. Epub 2020 May 13.
10
A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms.
BMC Genomics. 2020 Apr 9;21(1):293. doi: 10.1186/s12864-020-6707-9.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验