使用进化期望最大化算法来估计插入缺失率。

Using evolutionary Expectation Maximization to estimate indel rates.

作者信息

Holmes Ian

机构信息

Department of Statistics, 1 South Parks Road, Oxford OX1 3TG, UK.

出版信息

Bioinformatics. 2005 May 15;21(10):2294-300. doi: 10.1093/bioinformatics/bti177. Epub 2005 Feb 24.

DOI:10.1093/bioinformatics/bti177

PMID:15731213

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7197704/

Abstract

MOTIVATION

The Expectation Maximization (EM) algorithm, in the form of the Baum-Welch algorithm (for hidden Markov models) or the Inside-Outside algorithm (for stochastic context-free grammars), is a powerful way to estimate the parameters of stochastic grammars for biological sequence analysis. To use this algorithm for multiple-sequence evolutionary modelling, it would be useful to apply the EM algorithm to estimate not only the probability parameters of the stochastic grammar, but also the instantaneous mutation rates of the underlying evolutionary model (to facilitate the development of stochastic grammars based on phylogenetic trees, also known as Statistical Alignment). Recently, we showed how to do this for the point substitution component of the evolutionary process; here, we extend these results to the indel process.

RESULTS

We present an algorithm for maximum-likelihood estimation of insertion and deletion rates from multiple sequence alignments, using EM, under the single-residue indel model owing to Thorne, Kishino and Felsenstein (the 'TKF91' model). The algorithm converges extremely rapidly, gives accurate results on simulated data that are an improvement over parsimonious estimates (which are shown to underestimate the true indel rate), and gives plausible results on experimental data (coronavirus envelope domains). Owing to the algorithm's close similarity to the Baum-Welch algorithm for training hidden Markov models, it can be used in an 'unsupervised' fashion to estimate rates for unaligned sequences, or estimate several sets of rates for sequences with heterogenous rates.

AVAILABILITY

Software implementing the algorithm and the benchmark is available under GPL from http://www.biowiki.org/

摘要

动机

期望最大化（EM）算法，以用于隐马尔可夫模型的鲍姆 - 韦尔奇算法或用于随机上下文无关文法的内外算法的形式，是估计用于生物序列分析的随机文法参数的有力方法。为了将此算法用于多序列进化建模，不仅将EM算法应用于估计随机文法的概率参数，而且应用于估计基础进化模型的瞬时突变率（以促进基于系统发育树的随机文法的开发，也称为统计比对）将是有用的。最近，我们展示了如何针对进化过程中的点替换成分做到这一点；在这里，我们将这些结果扩展到插入缺失过程。

结果

我们提出了一种算法，用于在由于索恩、岸野和费尔斯滕森提出的单残基插入缺失模型（“TKF91”模型）下，使用EM从多序列比对中进行插入和缺失率的最大似然估计。该算法收敛极快，在模拟数据上给出的准确结果优于简约估计（简约估计被证明低估了真实的插入缺失率），并且在实验数据（冠状病毒包膜结构域）上给出了合理的结果。由于该算法与用于训练隐马尔可夫模型的鲍姆 - 韦尔奇算法非常相似，它可以以“无监督”方式用于估计未比对序列的速率，或估计具有异质速率的序列的几组速率。

可用性

实现该算法和基准测试的软件可在GPL许可下从http://www.biowiki.org/获得

相似文献

Using evolutionary Expectation Maximization to estimate indel rates.

Bioinformatics. 2005 May 15;21(10):2294-300. doi: 10.1093/bioinformatics/bti177. Epub 2005 Feb 24.

Bayesian coestimation of phylogeny and sequence alignment.

BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.

Statistical alignment based on fragment insertion and deletion models.

Bioinformatics. 2003 Mar 1;19(4):490-9. doi: 10.1093/bioinformatics/btg026.

Evolutionary inference via the Poisson Indel Process.

Proc Natl Acad Sci U S A. 2013 Jan 22;110(4):1160-6. doi: 10.1073/pnas.1220450110. Epub 2012 Dec 28.

Phylocomposer and phylodirector: analysis and visualization of transducer indel models.

Bioinformatics. 2007 Dec 1;23(23):3263-4. doi: 10.1093/bioinformatics/btm432. Epub 2007 Sep 5.

Statistical alignment with a sequence evolution model allowing rate heterogeneity along the sequence.

IEEE/ACM Trans Comput Biol Bioinform. 2009 Apr-Jun;6(2):281-95. doi: 10.1109/TCBB.2007.70246.

XRate: a fast prototyping, training and annotation tool for phylo-grammars.

BMC Bioinformatics. 2006 Oct 3;7:428. doi: 10.1186/1471-2105-7-428.

Implementing EM and Viterbi algorithms for Hidden Markov Model in linear memory.

BMC Bioinformatics. 2008 Apr 30;9:224. doi: 10.1186/1471-2105-9-224.

DNA assembly with gaps (Dawg): simulating sequence evolution.

Bioinformatics. 2005 Nov 1;21 Suppl 3:iii31-8. doi: 10.1093/bioinformatics/bti1200.

Evolutionary HMMs: a Bayesian approach to multiple alignment.

Bioinformatics. 2001 Sep;17(9):803-20. doi: 10.1093/bioinformatics/17.9.803.

引用本文的文献

Divisive hierarchical maximum likelihood clustering.

BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):546. doi: 10.1186/s12859-017-1965-5.

Fitting Birth-Death Processes to Panel Data with Applications to Bacterial DNA Fingerprinting.

Ann Appl Stat. 2013;7(4):2315-2335. doi: 10.1214/13-AOAS673.

A novel method for protein-protein interaction site prediction using phylogenetic substitution models.

Proteins. 2012 Jan;80(1):126-41. doi: 10.1002/prot.23169. Epub 2011 Oct 12.

BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC.

BMC Evol Biol. 2009 Aug 28;9:217. doi: 10.1186/1471-2148-9-217.

Problems and solutions for estimating indel rates and length distributions.

Mol Biol Evol. 2009 Feb;26(2):473-80. doi: 10.1093/molbev/msn275. Epub 2008 Nov 28.

A macaque's-eye view of human insertions and deletions: differences in mechanisms.

PLoS Comput Biol. 2007 Sep;3(9):1772-82. doi: 10.1371/journal.pcbi.0030176. Epub 2007 Jul 27.

The genomic landscape of short insertion and deletion polymorphisms in the chicken (Gallus gallus) Genome: a high frequency of deletions in tandem duplicates.

Genetics. 2007 Jul;176(3):1691-701. doi: 10.1534/genetics.107.070805. Epub 2007 May 16.

Three distinct modes of intron dynamics in the evolution of eukaryotes.

Genome Res. 2007 Jul;17(7):1034-44. doi: 10.1101/gr.6438607. Epub 2007 May 10.

Query-dependent banding (QDB) for faster RNA similarity searches.

PLoS Comput Biol. 2007 Mar 30;3(3):e56. doi: 10.1371/journal.pcbi.0030056. Epub 2007 Feb 7.

Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change.

BMC Bioinformatics. 2006 Mar 27;7:173. doi: 10.1186/1471-2105-7-173.

本文引用的文献

A probabilistic model for the evolution of RNA structure.

BMC Bioinformatics. 2004 Oct 26;5:166. doi: 10.1186/1471-2105-5-166.

A nucleotide substitution model with nearest-neighbour interactions.

Bioinformatics. 2004 Aug 4;20 Suppl 1:i216-23. doi: 10.1093/bioinformatics/bth901.

A "Long Indel" model for evolutionary sequence alignment.

Mol Biol Evol. 2004 Mar;21(3):529-40. doi: 10.1093/molbev/msh043. Epub 2003 Dec 23.

Phylogenetic estimation of context-dependent substitution rates by maximum likelihood.

Mol Biol Evol. 2004 Mar;21(3):468-88. doi: 10.1093/molbev/msh039. Epub 2003 Dec 5.

Sequence alignments and pair hidden Markov models using evolutionary history.

J Mol Biol. 2003 Oct 17;333(2):453-60. doi: 10.1016/j.jmb.2003.08.015.

Sequencing and comparison of yeast species to identify genes and regulatory elements.

Nature. 2003 May 15;423(6937):241-54. doi: 10.1038/nature01644.

The Genome sequence of the SARS-associated coronavirus.

Science. 2003 May 30;300(5624):1399-404. doi: 10.1126/science.1085953. Epub 2003 May 1.

An expectation maximization algorithm for training hidden substitution models.

J Mol Biol. 2002 Apr 12;317(5):753-64. doi: 10.1006/jmbi.2002.5405.

Assessing variability by joint sampling of alignments and mutation rates.

J Mol Evol. 2001 Dec;53(6):660-9. doi: 10.1007/s002390010253.

Evolutionary HMMs: a Bayesian approach to multiple alignment.

Bioinformatics. 2001 Sep;17(9):803-20. doi: 10.1093/bioinformatics/17.9.803.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用进化期望最大化算法来估计插入缺失率。

Using evolutionary Expectation Maximization to estimate indel rates.

作者信息

Holmes Ian

机构信息

Department of Statistics, 1 South Parks Road, Oxford OX1 3TG, UK.

出版信息

Bioinformatics. 2005 May 15;21(10):2294-300. doi: 10.1093/bioinformatics/bti177. Epub 2005 Feb 24.

DOI:10.1093/bioinformatics/bti177

PMID:15731213

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7197704/

Abstract

MOTIVATION

RESULTS

AVAILABILITY

Software implementing the algorithm and the benchmark is available under GPL from http://www.biowiki.org/

摘要

动机

结果

可用性

实现该算法和基准测试的软件可在GPL许可下从http://www.biowiki.org/获得

使用进化期望最大化算法来估计插入缺失率。

Using evolutionary Expectation Maximization to estimate indel rates.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

使用进化期望最大化算法来估计插入缺失率。

Using evolutionary Expectation Maximization to estimate indel rates.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性