• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

UPP2:快速准确地对齐具有片段序列的数据集。

UPP2: fast and accurate alignment of datasets with fragmentary sequences.

机构信息

Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61820, USA.

出版信息

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad007.

DOI:10.1093/bioinformatics/btad007
PMID:36625535
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9846425/
Abstract

MOTIVATION

Multiple sequence alignment (MSA) is a basic step in many bioinformatics pipelines. However, achieving highly accurate alignments on large datasets, especially those with sequence length heterogeneity, is a challenging task. Ultra-large multiple sequence alignment using Phylogeny-aware Profiles (UPP) is a method for MSA estimation that builds an ensemble of Hidden Markov Models (eHMM) to represent an estimated alignment on the full-length sequences in the input, and then adds the remaining sequences into the alignment using selected HMMs in the ensemble. Although UPP provides good accuracy, it is computationally intensive on large datasets.

RESULTS

We present UPP2, a direct improvement on UPP. The main advance is a fast technique for selecting HMMs in the ensemble that allows us to achieve the same accuracy as UPP but with greatly reduced runtime. We show that UPP2 produces more accurate alignments compared to leading MSA methods on datasets exhibiting substantial sequence length heterogeneity and is among the most accurate otherwise.

AVAILABILITY AND IMPLEMENTATION

https://github.com/gillichu/sepp.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

多序列比对(MSA)是许多生物信息学流程的基本步骤。然而,在大型数据集上实现高度准确的比对,特别是那些具有序列长度异质性的数据集,是一项具有挑战性的任务。使用 Phylogeny-aware Profiles(UPP)进行超大型多序列比对是一种 MSA 估计方法,它构建了一个隐马尔可夫模型(HMM)的集合来表示输入的全长序列上的估计比对,然后使用集合中的选定 HMM 将其余序列添加到比对中。尽管 UPP 提供了很好的准确性,但在大型数据集上计算量很大。

结果

我们提出了 UPP2,这是 UPP 的直接改进。主要的进展是一种在集合中选择 HMM 的快速技术,它允许我们实现与 UPP 相同的准确性,但运行时间大大缩短。我们表明,与具有大量序列长度异质性的数据集上的领先 MSA 方法相比,UPP2 产生了更准确的比对,并且在其他方面也是最准确的之一。

可用性和实现

https://github.com/gillichu/sepp。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b1d/9846425/44cad54c1032/btad007f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b1d/9846425/cde00c352059/btad007f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b1d/9846425/d70b008aaa9e/btad007f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b1d/9846425/8bc15ba3aede/btad007f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b1d/9846425/99c70184ac40/btad007f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b1d/9846425/26fad19403d3/btad007f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b1d/9846425/7c42d72aaccd/btad007f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b1d/9846425/44cad54c1032/btad007f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b1d/9846425/cde00c352059/btad007f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b1d/9846425/d70b008aaa9e/btad007f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b1d/9846425/8bc15ba3aede/btad007f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b1d/9846425/99c70184ac40/btad007f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b1d/9846425/26fad19403d3/btad007f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b1d/9846425/7c42d72aaccd/btad007f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b1d/9846425/44cad54c1032/btad007f7.jpg

相似文献

1
UPP2: fast and accurate alignment of datasets with fragmentary sequences.UPP2:快速准确地对齐具有片段序列的数据集。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad007.
2
MAGUS+eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences.MAGUS+隐马尔可夫模型:提高了片段序列的多序列比对准确性。
Bioinformatics. 2022 Jan 27;38(4):918-924. doi: 10.1093/bioinformatics/btab788.
3
Ultra-large alignments using phylogeny-aware profiles.使用系统发育感知概况的超大比对。
Genome Biol. 2015 Jun 16;16(1):124. doi: 10.1186/s13059-015-0688-z.
4
WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment.WITCH:通过加权一致隐马尔可夫模型比对改进多序列比对
J Comput Biol. 2022 Aug;29(8):782-801. doi: 10.1089/cmb.2021.0585. Epub 2022 May 17.
5
HMMerge: an ensemble method for multiple sequence alignment.HMMerge:一种用于多序列比对的集成方法。
Bioinform Adv. 2023 Apr 17;3(1):vbad052. doi: 10.1093/bioadv/vbad052. eCollection 2023.
6
Multiple Sequence Alignment for Large Heterogeneous Datasets Using SATé, PASTA, and UPP.使用SATé、PASTA和UPP对大型异构数据集进行多序列比对。
Methods Mol Biol. 2021;2231:99-119. doi: 10.1007/978-1-0716-1036-7_7.
7
Fast multiple sequence alignment via multi-armed bandits.基于多臂老虎机的快速多重序列比对。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i328-i336. doi: 10.1093/bioinformatics/btae225.
8
WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity.WITCH-NG:对具有序列长度异质性的数据集进行高效且准确的比对。
Bioinform Adv. 2023 Mar 6;3(1):vbad024. doi: 10.1093/bioadv/vbad024. eCollection 2023.
9
Phylogeny Estimation Given Sequence Length Heterogeneity.给定序列长度异质性的系统发育估计。
Syst Biol. 2021 Feb 10;70(2):268-282. doi: 10.1093/sysbio/syaa058.
10
MAGUS: Multiple sequence Alignment using Graph clUStering.MAGUS:基于图聚类的多重序列比对。
Bioinformatics. 2021 Jul 19;37(12):1666-1672. doi: 10.1093/bioinformatics/btaa992.

引用本文的文献

1
An nf-core framework for the systematic comparison of alternative modeling tools: the multiple sequence alignment case study.用于替代建模工具系统比较的nf-core框架:多序列比对案例研究
NAR Genom Bioinform. 2025 Jul 31;7(3):lqaf104. doi: 10.1093/nargab/lqaf104. eCollection 2025 Sep.
2
Augmenting microbial phylogenomic signal with tailored marker gene sets.用定制的标记基因集增强微生物系统发育信号。
bioRxiv. 2025 Mar 15:2025.03.13.643052. doi: 10.1101/2025.03.13.643052.
3
learnMSA2: deep protein multiple alignments with large language and hidden Markov models.

本文引用的文献

1
WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment.WITCH:通过加权一致隐马尔可夫模型比对改进多序列比对
J Comput Biol. 2022 Aug;29(8):782-801. doi: 10.1089/cmb.2021.0585. Epub 2022 May 17.
2
MAGUS+eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences.MAGUS+隐马尔可夫模型:提高了片段序列的多序列比对准确性。
Bioinformatics. 2022 Jan 27;38(4):918-924. doi: 10.1093/bioinformatics/btab788.
3
Recursive MAGUS: Scalable and accurate multiple sequence alignment.
learnMSA2:基于大型语言模型和隐马尔可夫模型的深度蛋白质多重比对。
Bioinformatics. 2024 Sep 1;40(Suppl 2):ii79-ii86. doi: 10.1093/bioinformatics/btae381.
4
Fast multiple sequence alignment via multi-armed bandits.基于多臂老虎机的快速多重序列比对。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i328-i336. doi: 10.1093/bioinformatics/btae225.
5
EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment.EMMA:一种在给定约束子集比对的情况下计算多序列比对的新方法。
Algorithms Mol Biol. 2023 Dec 7;18(1):21. doi: 10.1186/s13015-023-00247-x.
6
WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity.WITCH-NG:对具有序列长度异质性的数据集进行高效且准确的比对。
Bioinform Adv. 2023 Mar 6;3(1):vbad024. doi: 10.1093/bioadv/vbad024. eCollection 2023.
7
SCAMPP+FastTree: improving scalability for likelihood-based phylogenetic placement.SCAMPP+FastTree:提高基于似然法的系统发育定位的可扩展性。
Bioinform Adv. 2023 Jan 30;3(1):vbad008. doi: 10.1093/bioadv/vbad008. eCollection 2023.
递归 MAGUS:可扩展且精确的多重序列比对。
PLoS Comput Biol. 2021 Oct 6;17(10):e1008950. doi: 10.1371/journal.pcbi.1008950. eCollection 2021 Oct.
4
CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction.CopulaNet:直接从多序列比对中学习残基协同进化用于蛋白质结构预测。
Nat Commun. 2021 May 5;12(1):2535. doi: 10.1038/s41467-021-22869-8.
5
MAGUS: Multiple sequence Alignment using Graph clUStering.MAGUS:基于图聚类的多重序列比对。
Bioinformatics. 2021 Jul 19;37(12):1666-1672. doi: 10.1093/bioinformatics/btaa992.
6
Phylogeny Estimation Given Sequence Length Heterogeneity.给定序列长度异质性的系统发育估计。
Syst Biol. 2021 Feb 10;70(2):268-282. doi: 10.1093/sysbio/syaa058.
7
Large multiple sequence alignments with a root-to-leaf regressive method.使用根到叶回溯方法的大型多重序列比对。
Nat Biotechnol. 2019 Dec;37(12):1466-1470. doi: 10.1038/s41587-019-0333-6. Epub 2019 Dec 2.
8
HH-suite3 for fast remote homology detection and deep protein annotation.HH-suite3 用于快速远程同源检测和深度蛋白质注释。
BMC Bioinformatics. 2019 Sep 14;20(1):473. doi: 10.1186/s12859-019-3019-7.
9
Ultra-large alignments using phylogeny-aware profiles.使用系统发育感知概况的超大比对。
Genome Biol. 2015 Jun 16;16(1):124. doi: 10.1186/s13059-015-0688-z.
10
PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.PASTA:用于核苷酸和氨基酸序列的超大多重序列比对
J Comput Biol. 2015 May;22(5):377-86. doi: 10.1089/cmb.2014.0156. Epub 2014 Dec 30.