• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SigAlign:一种基于显式相似性标准的对齐算法。

SigAlign: an alignment algorithm guided by explicit similarity criteria.

机构信息

Interdisciplinary Program in Bioinformatics, College of Natural Sciences, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Korea.

Genome and Health Big Data Laboratory, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Korea.

出版信息

Nucleic Acids Res. 2024 Aug 27;52(15):8717-8733. doi: 10.1093/nar/gkae607.

DOI:10.1093/nar/gkae607
PMID:39011889
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11347165/
Abstract

In biological sequence alignment, prevailing heuristic aligners achieve high-throughput by several approximation techniques, but at the cost of sacrificing the clarity of output criteria and creating complex parameter spaces. To surmount these challenges, we introduce 'SigAlign', a novel alignment algorithm that employs two explicit cutoffs for the results: minimum length and maximum penalty per length, alongside three affine gap penalties. Comparative analyses of SigAlign against leading database search tools (BLASTn, MMseqs2) and read mappers (BWA-MEM, bowtie2, HISAT2, minimap2) highlight its performance in read mapping and database searches. Our research demonstrates that SigAlign not only provides high sensitivity with a non-heuristic approach, but also surpasses the throughput of existing heuristic aligners, particularly for high-accuracy reads or genomes with few repetitive regions. As an open-source library, SigAlign is poised to become a foundational component to provide a transparent and customizable alignment process to new analytical algorithms, tools and pipelines in bioinformatics.

摘要

在生物序列比对中,流行的启发式比对器通过多种近似技术实现了高通量,但代价是牺牲了输出标准的清晰度,并创建了复杂的参数空间。为了克服这些挑战,我们引入了“SigAlign”,这是一种新颖的比对算法,它为结果使用了两个显式截止值:最小长度和每个长度的最大罚分,以及三个仿射间隙罚分。SigAlign 与领先的数据库搜索工具(BLASTn、MMseqs2)和读映射器(BWA-MEM、bowtie2、HISAT2、minimap2)的比较分析突出了它在读取映射和数据库搜索中的性能。我们的研究表明,SigAlign 不仅提供了非启发式方法的高灵敏度,而且还超过了现有启发式比对器的吞吐量,特别是对于高精度读取或重复区域较少的基因组。作为一个开源库,SigAlign 有望成为新的分析算法、工具和生物信息学管道的基础组件,提供透明和可定制的比对过程。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54a4/11347165/cb74864ff4c1/gkae607fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54a4/11347165/4816706666d8/gkae607figgra1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54a4/11347165/42f119520b1f/gkae607fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54a4/11347165/dfa58e97542c/gkae607fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54a4/11347165/83f4f2d02da5/gkae607fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54a4/11347165/6602a8f35c1b/gkae607fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54a4/11347165/cb74864ff4c1/gkae607fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54a4/11347165/4816706666d8/gkae607figgra1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54a4/11347165/42f119520b1f/gkae607fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54a4/11347165/dfa58e97542c/gkae607fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54a4/11347165/83f4f2d02da5/gkae607fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54a4/11347165/6602a8f35c1b/gkae607fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54a4/11347165/cb74864ff4c1/gkae607fig5.jpg

相似文献

1
SigAlign: an alignment algorithm guided by explicit similarity criteria.SigAlign:一种基于显式相似性标准的对齐算法。
Nucleic Acids Res. 2024 Aug 27;52(15):8717-8733. doi: 10.1093/nar/gkae607.
2
Systematic benchmark of ancient DNA read mapping.系统评估古 DNA 读段映射。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab076.
3
Minimap2: pairwise alignment for nucleotide sequences.Minimap2:核苷酸序列的两两比对。
Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.
4
CUSHAW3: sensitive and accurate base-space and color-space short-read alignment with hybrid seeding.CUSHAW3:采用混合种子策略实现敏感且准确的碱基空间和颜色空间短读长比对
PLoS One. 2014 Jan 22;9(1):e86869. doi: 10.1371/journal.pone.0086869. eCollection 2014.
5
SRPRISM (Single Read Paired Read Indel Substitution Minimizer): an efficient aligner for assemblies with explicit guarantees.SRPRISM(单读配对读插入缺失替换最小化器):具有明确保证的组装的高效对齐器。
Gigascience. 2020 Apr 1;9(4). doi: 10.1093/gigascience/giaa023.
6
ARYANA: Aligning Reads by Yet Another Approach.ARYANA:另一种方法进行读段对齐。
BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S12. doi: 10.1186/1471-2105-15-S9-S12. Epub 2014 Sep 10.
7
SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner.SOAP3-dp:快速、准确、敏感的基于 GPU 的短读序列比对工具。
PLoS One. 2013 May 31;8(5):e65632. doi: 10.1371/journal.pone.0065632. Print 2013.
8
Assessing the impact of exact reads on reducing the error rate of read mapping.评估精确读取对降低读取映射错误率的影响。
BMC Bioinformatics. 2018 Nov 6;19(1):406. doi: 10.1186/s12859-018-2432-7.
9
Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics.基于全基因组特征,对多种新一代测序比对器的读段比对进行评估。
Genomics. 2017 Jul;109(3-4):186-191. doi: 10.1016/j.ygeno.2017.03.001. Epub 2017 Mar 9.
10
Vargas: heuristic-free alignment for assessing linear and graph read aligners.瓦尔加斯:用于评估线性和图形读取对齐程序的无启发式对齐。
Bioinformatics. 2020 Jun 1;36(12):3712-3718. doi: 10.1093/bioinformatics/btaa265.

本文引用的文献

1
Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4.利用 MetaPhlAn 4 对未鉴定物种进行宏基因组分类分析的扩展和改进。
Nat Biotechnol. 2023 Nov;41(11):1633-1644. doi: 10.1038/s41587-023-01688-w. Epub 2023 Feb 23.
2
The complete sequence of a human genome.人类基因组的完整序列。
Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31.
3
An optimized FM-index library for nucleotide and amino acid search.一个用于核苷酸和氨基酸搜索的优化FM索引库。
Algorithms Mol Biol. 2021 Dec 31;16(1):25. doi: 10.1186/s13015-021-00204-6.
4
HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data.HumGut:基于宏基因组数据过滤的综合人类肠道原核基因组集。
Microbiome. 2021 Jul 31;9(1):165. doi: 10.1186/s40168-021-01114-w.
5
Fast gap-affine pairwise alignment using the wavefront algorithm.基于波前算法的快速间隙亲和双序列比对。
Bioinformatics. 2021 May 1;37(4):456-463. doi: 10.1093/bioinformatics/btaa777.
6
A unified catalog of 204,938 reference genomes from the human gut microbiome.人类肠道微生物组 204938 个参考基因组的统一目录。
Nat Biotechnol. 2021 Jan;39(1):105-114. doi: 10.1038/s41587-020-0603-3. Epub 2020 Jul 20.
7
Vargas: heuristic-free alignment for assessing linear and graph read aligners.瓦尔加斯:用于评估线性和图形读取对齐程序的无启发式对齐。
Bioinformatics. 2020 Jun 1;36(12):3712-3718. doi: 10.1093/bioinformatics/btaa265.
8
Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype.基于图的基因组比对和基因分型与 HISAT2 和 HISAT-genotype。
Nat Biotechnol. 2019 Aug;37(8):907-915. doi: 10.1038/s41587-019-0201-4. Epub 2019 Aug 2.
9
Commonly misunderstood parameters of NCBI BLAST and important considerations for users.美国国立生物技术信息中心(NCBI)基本局部比对搜索工具(BLAST)中常见的误解参数及用户的重要注意事项。
Bioinformatics. 2019 Aug 1;35(15):2697-2698. doi: 10.1093/bioinformatics/bty1018.
10
High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries.高通量 ANI 分析 9 万余组原核基因组揭示了清晰的物种界限。
Nat Commun. 2018 Nov 30;9(1):5114. doi: 10.1038/s41467-018-07641-9.