• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

EMMA:一种在给定约束子集比对的情况下计算多序列比对的新方法。

EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment.

作者信息

Shen Chengze, Liu Baqiao, Williams Kelly P, Warnow Tandy

机构信息

Computer Science, University of Illinois, Urbana-Champaign, 201 N. Goodwin Ave, Urbana, 61801, IL, USA.

Sandia National Laboratories, 7011 East Ave., Livermore, 94550, CA, USA.

出版信息

Algorithms Mol Biol. 2023 Dec 7;18(1):21. doi: 10.1186/s13015-023-00247-x.

DOI:10.1186/s13015-023-00247-x
PMID:38062452
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10704716/
Abstract

BACKGROUND

Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity.

RESULTS

We present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available at https://github.com/c5shen/EMMA .

CONCLUSIONS

EMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment.

摘要

背景

将序列添加到现有的(可能是用户提供的)比对中具有多种应用,包括用新数据更新大型比对、将序列添加到利用生物学知识构建的约束比对中,或在存在序列长度异质性的情况下计算比对。尽管这是一个很自然的问题,但只有少数工具被开发出来以高保真度使用这些信息。

结果

我们提出了EMMA(使用MAFFT扩展多序列比对——添加)来解决将一组未比对序列添加到多序列比对(即约束比对)中的问题。EMMA基于MAFFT——添加构建,MAFFT——添加也是设计用于将序列添加到给定的约束比对中。EMMA通过使用分治框架改进了MAFFT——添加方法,将其最准确的版本MAFFT-linsi——添加扩展到具有许多序列的约束比对。我们表明,在许多实际条件下,EMMA在将序列添加到比对方面比其他技术具有准确性优势,并且可以高精度地扩展到大型数据集(数十万条序列)。EMMA可在https://github.com/c5shen/EMMA获取。

结论

EMMA是一种新工具,为将序列添加到现有比对中提供了高精度和可扩展性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/376e/10704716/ada4679dcebc/13015_2023_247_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/376e/10704716/28fa2c6e219a/13015_2023_247_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/376e/10704716/fbea75ae6ddf/13015_2023_247_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/376e/10704716/d178d63aec58/13015_2023_247_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/376e/10704716/d73b3c0f12a8/13015_2023_247_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/376e/10704716/df9eab1dc69a/13015_2023_247_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/376e/10704716/cc4c2ecaff3f/13015_2023_247_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/376e/10704716/ada4679dcebc/13015_2023_247_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/376e/10704716/28fa2c6e219a/13015_2023_247_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/376e/10704716/fbea75ae6ddf/13015_2023_247_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/376e/10704716/d178d63aec58/13015_2023_247_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/376e/10704716/d73b3c0f12a8/13015_2023_247_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/376e/10704716/df9eab1dc69a/13015_2023_247_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/376e/10704716/cc4c2ecaff3f/13015_2023_247_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/376e/10704716/ada4679dcebc/13015_2023_247_Fig7_HTML.jpg

相似文献

1
EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment.EMMA:一种在给定约束子集比对的情况下计算多序列比对的新方法。
Algorithms Mol Biol. 2023 Dec 7;18(1):21. doi: 10.1186/s13015-023-00247-x.
2
MAGUS: Multiple sequence Alignment using Graph clUStering.MAGUS:基于图聚类的多重序列比对。
Bioinformatics. 2021 Jul 19;37(12):1666-1672. doi: 10.1093/bioinformatics/btaa992.
3
HMMerge: an ensemble method for multiple sequence alignment.HMMerge:一种用于多序列比对的集成方法。
Bioinform Adv. 2023 Apr 17;3(1):vbad052. doi: 10.1093/bioadv/vbad052. eCollection 2023.
4
Improvement in the accuracy of multiple sequence alignment program MAFFT.多重序列比对程序MAFFT准确性的提高。
Genome Inform. 2005;16(1):22-33.
5
Adding unaligned sequences into an existing alignment using MAFFT and LAST.使用 MAFFT 和 LAST 将未对齐的序列添加到现有比对中。
Bioinformatics. 2012 Dec 1;28(23):3144-6. doi: 10.1093/bioinformatics/bts578. Epub 2012 Sep 27.
6
MAFFT version 5: improvement in accuracy of multiple sequence alignment.MAFFT 5 版本:多重序列比对准确性的提升。
Nucleic Acids Res. 2005 Jan 20;33(2):511-8. doi: 10.1093/nar/gki198. Print 2005.
7
WMSA: a novel method for multiple sequence alignment of DNA sequences.WMSA:一种用于 DNA 序列多重序列比对的新方法。
Bioinformatics. 2022 Nov 15;38(22):5019-5025. doi: 10.1093/bioinformatics/btac658.
8
SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.SATe-II:一种非常快速且准确的同时估计多个序列比对和系统发育树的方法。
Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.
9
PASTA for proteins.PASTA 用于蛋白质。
Bioinformatics. 2018 Nov 15;34(22):3939-3941. doi: 10.1093/bioinformatics/bty495.
10
MAFFT-DASH: integrated protein sequence and structural alignment.MAFFT-DASH:集成蛋白质序列和结构比对。
Nucleic Acids Res. 2019 Jul 2;47(W1):W5-W10. doi: 10.1093/nar/gkz342.

引用本文的文献

1
Revolutionizing Molecular Design for Innovative Therapeutic Applications through Artificial Intelligence.通过人工智能为创新治疗应用彻底改变分子设计。
Molecules. 2024 Sep 29;29(19):4626. doi: 10.3390/molecules29194626.

本文引用的文献

1
HMMerge: an ensemble method for multiple sequence alignment.HMMerge:一种用于多序列比对的集成方法。
Bioinform Adv. 2023 Apr 17;3(1):vbad052. doi: 10.1093/bioadv/vbad052. eCollection 2023.
2
WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity.WITCH-NG:对具有序列长度异质性的数据集进行高效且准确的比对。
Bioinform Adv. 2023 Mar 6;3(1):vbad024. doi: 10.1093/bioadv/vbad024. eCollection 2023.
3
UPP2: fast and accurate alignment of datasets with fragmentary sequences.UPP2:快速准确地对齐具有片段序列的数据集。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad007.
4
WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment.WITCH:通过加权一致隐马尔可夫模型比对改进多序列比对
J Comput Biol. 2022 Aug;29(8):782-801. doi: 10.1089/cmb.2021.0585. Epub 2022 May 17.
5
MAGUS+eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences.MAGUS+隐马尔可夫模型:提高了片段序列的多序列比对准确性。
Bioinformatics. 2022 Jan 27;38(4):918-924. doi: 10.1093/bioinformatics/btab788.
6
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
7
MAGUS: Multiple sequence Alignment using Graph clUStering.MAGUS:基于图聚类的多重序列比对。
Bioinformatics. 2021 Jul 19;37(12):1666-1672. doi: 10.1093/bioinformatics/btaa992.
8
Pfam: The protein families database in 2021.Pfam:2021 年的蛋白质家族数据库。
Nucleic Acids Res. 2021 Jan 8;49(D1):D412-D419. doi: 10.1093/nar/gkaa913.
9
PASTA for proteins.PASTA 用于蛋白质。
Bioinformatics. 2018 Nov 15;34(22):3939-3941. doi: 10.1093/bioinformatics/bty495.
10
Application of the MAFFT sequence alignment program to large data-reexamination of the usefulness of chained guide trees.将MAFFT序列比对程序应用于对链式引导树实用性的大数据重新检验。
Bioinformatics. 2016 Nov 1;32(21):3246-3251. doi: 10.1093/bioinformatics/btw412. Epub 2016 Jul 4.