• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

超越相似度评估:通过因式渐近贝叶斯算法选择序列比对的最优模型。

Beyond similarity assessment: selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm.

机构信息

Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, Tokyo 169-8555, Japan.

Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.

出版信息

Bioinformatics. 2018 Feb 15;34(4):576-584. doi: 10.1093/bioinformatics/btx643.

DOI:10.1093/bioinformatics/btx643
PMID:29040374
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5860613/
Abstract

MOTIVATION

Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment accuracy.

RESULTS

We developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criterion, which is widely utilized in model selection for probabilistic models with hidden variables. Our simulations indicated that this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies.

AVAILABILITY AND IMPLEMENTATION

The software is available at https://github.com/bigsea-t/fab-phmm.

CONTACT

mhamada@waseda.jp.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

成对隐马尔可夫模型 (PHMM) 是用于两两序列比对的概率模型,这是生物信息学中的一个基本问题。PHMM 包括三种隐藏状态:匹配、插入和删除。大多数先前的研究为每个 PHMM 状态类型使用一个或两个隐藏状态。然而,很少有研究检查适合表示序列数据或提高对齐准确性的状态数量。

结果

我们开发了一种选择 PHMM 优越模型(包括隐藏状态数量)的新方法。我们的方法使用因子化信息准则选择具有最高后验概率的模型,该准则广泛用于具有隐藏变量的概率模型的模型选择。我们的模拟表明,该方法具有出色的模型选择能力,并且略微提高了对齐准确性。我们将我们的方法应用于来自 5 种和 28 种物种的 DNA 数据集,最终选择的模型比以前研究中使用的模型更复杂。

可用性和实现

该软件可在 https://github.com/bigsea-t/fab-phmm 上获得。

联系方式

mhamada@waseda.jp。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b4/5860613/0f91eb69a008/btx643f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b4/5860613/78acc756f90c/btx643f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b4/5860613/9790a6918662/btx643f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b4/5860613/45ccf5ed6721/btx643f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b4/5860613/0e0ce210ca06/btx643f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b4/5860613/501c54e36ee8/btx643f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b4/5860613/8b954c2bf099/btx643f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b4/5860613/0f91eb69a008/btx643f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b4/5860613/78acc756f90c/btx643f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b4/5860613/9790a6918662/btx643f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b4/5860613/45ccf5ed6721/btx643f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b4/5860613/0e0ce210ca06/btx643f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b4/5860613/501c54e36ee8/btx643f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b4/5860613/8b954c2bf099/btx643f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b4/5860613/0f91eb69a008/btx643f7.jpg

相似文献

1
Beyond similarity assessment: selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm.超越相似度评估:通过因式渐近贝叶斯算法选择序列比对的最优模型。
Bioinformatics. 2018 Feb 15;34(4):576-584. doi: 10.1093/bioinformatics/btx643.
2
pHMM-tree: phylogeny of profile hidden Markov models.pHMM树:轮廓隐马尔可夫模型的系统发育
Bioinformatics. 2017 Apr 1;33(7):1093-1095. doi: 10.1093/bioinformatics/btw779.
3
MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities.MSAProbs:基于对隐马尔可夫模型和分区函数后验概率的多重序列比对。
Bioinformatics. 2010 Aug 15;26(16):1958-64. doi: 10.1093/bioinformatics/btq338. Epub 2010 Jun 23.
4
Parallelization of MAFFT for large-scale multiple sequence alignments.并行化 MAFFT 进行大规模多序列比对。
Bioinformatics. 2018 Jul 15;34(14):2490-2492. doi: 10.1093/bioinformatics/bty121.
5
A clustering approach for estimating parameters of a profile hidden Markov model.一种用于估计轮廓隐马尔可夫模型参数的聚类方法。
Int J Data Min Bioinform. 2013;8(1):66-82. doi: 10.1504/ijdmb.2013.054696.
6
Bayesian restoration of a hidden Markov chain with applications to DNA sequencing.应用于DNA测序的隐马尔可夫链的贝叶斯恢复
J Comput Biol. 1999 Summer;6(2):261-77. doi: 10.1089/cmb.1999.6.261.
7
CMV: visualization for RNA and protein family models and their comparisons.CMV:RNA 和蛋白质家族模型及其比较的可视化。
Bioinformatics. 2018 Aug 1;34(15):2676-2678. doi: 10.1093/bioinformatics/bty158.
8
Significant speedup of database searches with HMMs by search space reduction with PSSM family models.利用 PSSM 家族模型缩小搜索空间,大大提高了 HMM 对数据库的搜索速度。
Bioinformatics. 2009 Dec 15;25(24):3251-8. doi: 10.1093/bioinformatics/btp593. Epub 2009 Oct 14.
9
ProbPFP: a multiple sequence alignment algorithm combining hidden Markov model optimized by particle swarm optimization with partition function.ProbPFP:一种通过粒子群优化算法优化的隐马尔可夫模型与分区函数相结合的多序列比对算法。
BMC Bioinformatics. 2019 Nov 25;20(Suppl 18):573. doi: 10.1186/s12859-019-3132-7.
10
Fast multiple sequence alignment via multi-armed bandits.基于多臂老虎机的快速多重序列比对。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i328-i336. doi: 10.1093/bioinformatics/btae225.

本文引用的文献

1
Training alignment parameters for arbitrary sequencers with LAST-TRAIN.使用LAST-TRAIN为任意测序仪训练比对参数。
Bioinformatics. 2017 Mar 15;33(6):926-928. doi: 10.1093/bioinformatics/btw742.
2
Parameterizing sequence alignment with an explicit evolutionary model.使用显式进化模型对序列比对进行参数化。
BMC Bioinformatics. 2015 Dec 10;16:406. doi: 10.1186/s12859-015-0832-5.
3
Split-alignment of genomes finds orthologies more accurately.基因组的分裂比对能更准确地找到直系同源基因。
Genome Biol. 2015 May 21;16(1):106. doi: 10.1186/s13059-015-0670-9.
4
A survey of sequence alignment algorithms for next-generation sequencing.下一代测序序列比对算法综述。
Brief Bioinform. 2010 Sep;11(5):473-83. doi: 10.1093/bib/bbq015. Epub 2010 May 11.
5
Parameters for accurate genome alignment.基因组精确比对的参数。
BMC Bioinformatics. 2010 Feb 9;11:80. doi: 10.1186/1471-2105-11-80.
6
Fast statistical alignment.快速统计对齐
PLoS Comput Biol. 2009 May;5(5):e1000392. doi: 10.1371/journal.pcbi.1000392. Epub 2009 May 29.
7
Problems and solutions for estimating indel rates and length distributions.估计插入缺失率和长度分布的问题与解决方案。
Mol Biol Evol. 2009 Feb;26(2):473-80. doi: 10.1093/molbev/msn275. Epub 2008 Nov 28.
8
Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs.Enredo和Pecan:基于全基因组哺乳动物一致性的旁系同源物多序列比对
Genome Res. 2008 Nov;18(11):1814-28. doi: 10.1101/gr.076554.108. Epub 2008 Oct 10.
9
Uncertainty in homology inferences: assessing and improving genomic sequence alignment.同源性推断中的不确定性:评估和改进基因组序列比对
Genome Res. 2008 Feb;18(2):298-309. doi: 10.1101/gr.6725608. Epub 2007 Dec 11.
10
Applications of generalized pair hidden Markov models to alignment and gene finding problems.广义配对隐马尔可夫模型在序列比对和基因查找问题中的应用。
J Comput Biol. 2002;9(2):389-99. doi: 10.1089/10665270252935520.