• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

WITCH:通过加权一致隐马尔可夫模型比对改进多序列比对

WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment.

作者信息

Shen Chengze, Park Minhyuk, Warnow Tandy

机构信息

Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA.

出版信息

J Comput Biol. 2022 Aug;29(8):782-801. doi: 10.1089/cmb.2021.0585. Epub 2022 May 17.

DOI:10.1089/cmb.2021.0585
PMID:35575747
Abstract

Accurate multiple sequence alignment is challenging on many data sets, including those that are large, evolve under high rates of evolution, or have sequence length heterogeneity. While substantial progress has been made over the last decade in addressing the first two challenges, sequence length heterogeneity remains a significant issue for many data sets. Sequence length heterogeneity occurs for biological and technological reasons, including large insertions or deletions (indels) that occurred in the evolutionary history relating the sequences, or the inclusion of sequences that are not fully assembled. Ultra-large alignments using Phylogeny-Aware Profiles (UPP) (Nguyen et al. 2015) is one of the most accurate approaches for aligning data sets that exhibit sequence length heterogeneity: it constructs an alignment on the subset of sequences it considers "full-length," represents this "backbone alignment" using an ensemble of hidden Markov models (HMMs), and then adds each remaining sequence into the backbone alignment based on an HMM selected for that sequence from the ensemble. Our new method, WeIghTed Consensus Hmm alignment (WITCH), improves on UPP in three important ways: first, it uses a statistically principled technique to weight and rank the HMMs; second, it uses

HMMs from the ensemble rather than a single HMM; and third, it combines the alignments for each of the selected HMMs using a consensus algorithm that takes the weights into account. We show that this approach provides improved alignment accuracy compared with UPP and other leading alignment methods, as well as improved accuracy for maximum likelihood trees based on these alignments.

摘要

准确的多序列比对在许多数据集上都具有挑战性,包括那些规模大、进化速率高或存在序列长度异质性的数据集。虽然在过去十年中,在应对前两个挑战方面取得了重大进展,但序列长度异质性对许多数据集来说仍然是一个重大问题。序列长度异质性的出现有生物学和技术方面的原因,包括在与这些序列相关的进化历史中发生的大的插入或缺失(indels),或者包含未完全组装的序列。使用系统发育感知概况(UPP)(Nguyen等人,2015年)进行超大型比对是比对呈现序列长度异质性的数据集最准确的方法之一:它在其认为是“全长”的序列子集上构建比对,使用一组隐马尔可夫模型(HMM)来表示这个“主干比对”,然后根据从该组中为该序列选择的HMM将每个剩余序列添加到主干比对中。我们的新方法,加权一致HMM比对(WITCH),在三个重要方面对UPP进行了改进:第一,它使用一种基于统计原则的技术来对HMM进行加权和排序;第二,它使用该组中的HMM而不是单个HMM;第三,它使用一种考虑权重的一致算法来组合每个选定HMM的比对。我们表明,与UPP和其他领先的比对方法相比,这种方法提高了比对准确性,并且基于这些比对的最大似然树的准确性也得到了提高。

相似文献

1
WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment.WITCH:通过加权一致隐马尔可夫模型比对改进多序列比对
J Comput Biol. 2022 Aug;29(8):782-801. doi: 10.1089/cmb.2021.0585. Epub 2022 May 17.
2
UPP2: fast and accurate alignment of datasets with fragmentary sequences.UPP2:快速准确地对齐具有片段序列的数据集。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad007.
3
HMMerge: an ensemble method for multiple sequence alignment.HMMerge:一种用于多序列比对的集成方法。
Bioinform Adv. 2023 Apr 17;3(1):vbad052. doi: 10.1093/bioadv/vbad052. eCollection 2023.
4
Ultra-large alignments using phylogeny-aware profiles.使用系统发育感知概况的超大比对。
Genome Biol. 2015 Jun 16;16(1):124. doi: 10.1186/s13059-015-0688-z.
5
Sequence alignments and pair hidden Markov models using evolutionary history.使用进化历史的序列比对和配对隐马尔可夫模型。
J Mol Biol. 2003 Oct 17;333(2):453-60. doi: 10.1016/j.jmb.2003.08.015.
6
Fast multiple sequence alignment via multi-armed bandits.基于多臂老虎机的快速多重序列比对。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i328-i336. doi: 10.1093/bioinformatics/btae225.
7
Enhancing the quality of phylogenetic analysis using fuzzy hidden Markov model alignments.使用模糊隐马尔可夫模型比对提高系统发育分析的质量。
Stud Health Technol Inform. 2007;129(Pt 2):1245-9.
8
Alignment of multiple proteins with an ensemble of hidden Markov models.使用隐马尔可夫模型集合对多个蛋白质进行比对。
Int J Data Min Bioinform. 2010;4(1):60-71. doi: 10.1504/ijdmb.2010.030967.
9
MAGUS+eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences.MAGUS+隐马尔可夫模型:提高了片段序列的多序列比对准确性。
Bioinformatics. 2022 Jan 27;38(4):918-924. doi: 10.1093/bioinformatics/btab788.
10
SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.SATe-II:一种非常快速且准确的同时估计多个序列比对和系统发育树的方法。
Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.

引用本文的文献

1
Functionally distinct core microbes of Tricholoma matsutake revealed by cross-study analysis.
Microbiome. 2026 Feb 4;14(1):58. doi: 10.1186/s40168-025-02329-x.
2
TIPP3 and TIPP3-fast: Improved abundance profiling in metagenomics.TIPP3和TIPP3-fast:宏基因组学中改进的丰度分析
PLoS Comput Biol. 2025 Apr 4;21(4):e1012593. doi: 10.1371/journal.pcbi.1012593. eCollection 2025 Apr.
3
EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment.EMMA:一种在给定约束子集比对的情况下计算多序列比对的新方法。
Algorithms Mol Biol. 2023 Dec 7;18(1):21. doi: 10.1186/s13015-023-00247-x.
4
HMMerge: an ensemble method for multiple sequence alignment.HMMerge:一种用于多序列比对的集成方法。
Bioinform Adv. 2023 Apr 17;3(1):vbad052. doi: 10.1093/bioadv/vbad052. eCollection 2023.
5
WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity.WITCH-NG:对具有序列长度异质性的数据集进行高效且准确的比对。
Bioinform Adv. 2023 Mar 6;3(1):vbad024. doi: 10.1093/bioadv/vbad024. eCollection 2023.
6
UPP2: fast and accurate alignment of datasets with fragmentary sequences.UPP2:快速准确地对齐具有片段序列的数据集。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad007.