Suppr超能文献

超越相似度评估:通过因式渐近贝叶斯算法选择序列比对的最优模型。

Beyond similarity assessment: selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm.

机构信息

Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, Tokyo 169-8555, Japan.

Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.

出版信息

Bioinformatics. 2018 Feb 15;34(4):576-584. doi: 10.1093/bioinformatics/btx643.

Abstract

MOTIVATION

Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment accuracy.

RESULTS

We developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criterion, which is widely utilized in model selection for probabilistic models with hidden variables. Our simulations indicated that this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies.

AVAILABILITY AND IMPLEMENTATION

The software is available at https://github.com/bigsea-t/fab-phmm.

CONTACT

mhamada@waseda.jp.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

成对隐马尔可夫模型 (PHMM) 是用于两两序列比对的概率模型,这是生物信息学中的一个基本问题。PHMM 包括三种隐藏状态:匹配、插入和删除。大多数先前的研究为每个 PHMM 状态类型使用一个或两个隐藏状态。然而,很少有研究检查适合表示序列数据或提高对齐准确性的状态数量。

结果

我们开发了一种选择 PHMM 优越模型(包括隐藏状态数量)的新方法。我们的方法使用因子化信息准则选择具有最高后验概率的模型,该准则广泛用于具有隐藏变量的概率模型的模型选择。我们的模拟表明,该方法具有出色的模型选择能力,并且略微提高了对齐准确性。我们将我们的方法应用于来自 5 种和 28 种物种的 DNA 数据集,最终选择的模型比以前研究中使用的模型更复杂。

可用性和实现

该软件可在 https://github.com/bigsea-t/fab-phmm 上获得。

联系方式

mhamada@waseda.jp

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28b4/5860613/78acc756f90c/btx643f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验