• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Spaln3:提高基因组作图和蛋白质查询序列拼接比对的速度和准确性。

Spaln3: improvement in speed and accuracy of genome mapping and spliced alignment of protein query sequences.

机构信息

Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8562, Japan.

Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo 135-0064, Japan.

出版信息

Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae517.

DOI:10.1093/bioinformatics/btae517
PMID:39152995
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11361809/
Abstract

MOTIVATION

Spaln is the earliest practical tool for self-sufficient genome mapping and spliced alignment of protein query sequences onto a mammalian-sized eukaryotic genomic sequence. However, its computational speed has become inadequate for the analysis of rapidly growing genomic and transcript sequence data.

RESULTS

The dynamic programming calculation of Spaln has been sped up in two ways: (i) the introduction of the multi-intermediate unidirectional Hirschberg method and (ii) SIMD-based vectorization. The new version, Spaln3, is ∼7 times faster than the latest Spaln version 2, and its gene prediction accuracy is consistently higher than that of Miniprot.

AVAILABILITY AND IMPLEMENTATION

https://github.com/ogotoh/spaln.

摘要

动机

Spaln 是最早用于自给自足的基因组图谱绘制和蛋白质查询序列与哺乳动物大小的真核基因组序列拼接比对的实用工具。然而,其计算速度已经不足以分析快速增长的基因组和转录序列数据。

结果

Spaln 的动态规划计算通过两种方式得到了加速:(i)引入多中间单向 Hirschberg 方法和(ii)基于 SIMD 的向量化。新版本 Spaln3 比最新的 Spaln 版本 2 快约 7 倍,其基因预测准确性始终高于 Miniprot。

可用性和实现

https://github.com/ogotoh/spaln。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0bd/11361809/7c9aa380a890/btae517f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0bd/11361809/fe4f31fde461/btae517f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0bd/11361809/c86ea9e0e9b8/btae517f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0bd/11361809/7c9aa380a890/btae517f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0bd/11361809/fe4f31fde461/btae517f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0bd/11361809/c86ea9e0e9b8/btae517f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0bd/11361809/7c9aa380a890/btae517f3.jpg

相似文献

1
Spaln3: improvement in speed and accuracy of genome mapping and spliced alignment of protein query sequences.Spaln3:提高基因组作图和蛋白质查询序列拼接比对的速度和准确性。
Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae517.
2
A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence.一种用于将cDNA序列定位和比对到基因组序列上的节省空间且准确的方法。
Nucleic Acids Res. 2008 May;36(8):2630-8. doi: 10.1093/nar/gkn105. Epub 2008 Mar 15.
3
Direct mapping and alignment of protein sequences onto genomic sequence.蛋白质序列到基因组序列的直接映射与比对。
Bioinformatics. 2008 Nov 1;24(21):2438-44. doi: 10.1093/bioinformatics/btn460. Epub 2008 Aug 26.
4
Cooperation of Spaln and Prrn5 for Construction of Gene-Structure-Aware Multiple Sequence Alignment.Spaln和Prrn5在构建基因结构感知多序列比对中的合作。
Methods Mol Biol. 2021;2231:71-88. doi: 10.1007/978-1-0716-1036-7_5.
5
Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features.对拼接比对程序进行基准测试,包括 Spaln2,这是 Spaln 的扩展版本,其中包含了额外的特定于物种的特征。
Nucleic Acids Res. 2012 Nov 1;40(20):e161. doi: 10.1093/nar/gks708. Epub 2012 Jul 30.
6
Protein-to-genome alignment with miniprot.用 Miniprot 进行蛋白质到基因组的比对。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad014.
7
SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups. splicedFamAlign:CDS 到基因拼接对齐和转录本同源物组的鉴定。
BMC Bioinformatics. 2019 Mar 29;20(Suppl 3):133. doi: 10.1186/s12859-019-2647-2.
8
GMAP and GSNAP for Genomic Sequence Alignment: Enhancements to Speed, Accuracy, and Functionality.用于基因组序列比对的GMAP和GSNAP:速度、准确性及功能的提升
Methods Mol Biol. 2016;1418:283-334. doi: 10.1007/978-1-4939-3578-9_15.
9
Sim4cc: a cross-species spliced alignment program.Sim4cc:一种跨物种剪接比对程序。
Nucleic Acids Res. 2009 Jun;37(11):e80. doi: 10.1093/nar/gkp319. Epub 2009 May 8.
10
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.Parasail:用于全局、半全局和局部成对序列比对的SIMD C库。
BMC Bioinformatics. 2016 Feb 10;17:81. doi: 10.1186/s12859-016-0930-z.

引用本文的文献

1
The impact of telomere-to-telomere genome assembly in the plant pan-genomics era.端粒到端粒基因组组装在植物泛基因组时代的影响。
Breed Sci. 2025 Mar;75(1):3-12. doi: 10.1270/jsbbs.24065. Epub 2025 Feb 21.

本文引用的文献

1
Protein-to-genome alignment with miniprot.用 Miniprot 进行蛋白质到基因组的比对。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad014.
2
BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database.BRAKER2:借助蛋白质数据库,由GeneMark-EP+和AUGUSTUS支持的真核生物基因组自动注释工具。
NAR Genom Bioinform. 2021 Jan 6;3(1):lqaa108. doi: 10.1093/nargab/lqaa108. eCollection 2021 Mar.
3
Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning.
Helixer:利用深度学习对大型真核生物基因组进行跨物种基因注释。
Bioinformatics. 2021 Apr 1;36(22-23):5291-5298. doi: 10.1093/bioinformatics/btaa1044.
4
Minimap2: pairwise alignment for nucleotide sequences.Minimap2:核苷酸序列的两两比对。
Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.
5
Using intron position conservation for homology-based gene prediction.利用内含子位置保守性进行基于同源性的基因预测。
Nucleic Acids Res. 2016 May 19;44(9):e89. doi: 10.1093/nar/gkw092. Epub 2016 Feb 17.
6
Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features.对拼接比对程序进行基准测试,包括 Spaln2,这是 Spaln 的扩展版本,其中包含了额外的特定于物种的特征。
Nucleic Acids Res. 2012 Nov 1;40(20):e161. doi: 10.1093/nar/gks708. Epub 2012 Jul 30.
7
MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects.MAKER2:用于第二代基因组项目的注释流水线和基因组数据库管理工具。
BMC Bioinformatics. 2011 Dec 22;12:491. doi: 10.1186/1471-2105-12-491.
8
Direct mapping and alignment of protein sequences onto genomic sequence.蛋白质序列到基因组序列的直接映射与比对。
Bioinformatics. 2008 Nov 1;24(21):2438-44. doi: 10.1093/bioinformatics/btn460. Epub 2008 Aug 26.
9
Homology-based gene structure prediction: simplified matching algorithm using a translated codon (tron) and improved accuracy by allowing for long gaps.基于同源性的基因结构预测:使用翻译密码子(tron)的简化匹配算法,并通过允许长间隙提高准确性。
Bioinformatics. 2000 Mar;16(3):190-202. doi: 10.1093/bioinformatics/16.3.190.
10
A genomic perspective on protein families.蛋白质家族的基因组视角。
Science. 1997 Oct 24;278(5338):631-7. doi: 10.1126/science.278.5338.631.