Dang Utkarsh J, Golding G Brian
Department of Biology, McMaster University, Hamilton, Ontario, L8S 4K1, Canada.
Bioinformatics. 2016 Jan 1;32(1):130-2. doi: 10.1093/bioinformatics/btv541. Epub 2015 Sep 11.
Continuous-time Markov chain models with finite state space are routinely used for analysis of discrete character data on phylogenetic trees. Examples of such discrete character data include restriction sites, gene family presence/absence, intron presence/absence and gene family size data. While models with constrained substitution rate matrices have been used to good effect, more biologically realistic models have been increasingly implemented in the recent literature combining, e.g., site rate variation, site partitioning, branch-specific rates, allowing for non-stationary prior root probabilities, correcting for sampling bias, etc. to name a few. Here, a flexible and fast R package is introduced that infers evolutionary rates of discrete characters on a tree within a probabilistic framework. The package, markophylo, fits maximum-likelihood models using Markov chains on phylogenetic trees. The package is efficient, with the workhorse functions written in C++ and the interface in user-friendly R.
markophylo is available as a platform-independent R package from the Comprehensive R Archive Network at https://cran.r-project.org/web/packages/markophylo/. A vignette with numerous examples is also provided with the R package.
Supplementary data are available at Bioinformatics online.
具有有限状态空间的连续时间马尔可夫链模型经常用于分析系统发育树上的离散性状数据。此类离散性状数据的示例包括限制性酶切位点、基因家族的存在与否、内含子的存在与否以及基因家族大小数据。虽然具有受限替换率矩阵的模型已取得良好效果,但最近的文献中越来越多地采用了更符合生物学实际的模型,例如结合位点速率变化、位点划分、分支特定速率、允许非平稳先验根概率、校正抽样偏差等。在此,介绍了一个灵活且快速的R包,它在概率框架内推断树上离散性状的进化速率。该包markophylo使用系统发育树上的马尔可夫链拟合最大似然模型。该包效率很高,其主要函数用C++编写,接口采用用户友好的R语言。
markophylo可作为一个与平台无关的R包从综合R存档网络(https://cran.r-project.org/web/packages/markophylo/)获取。R包还提供了一个包含大量示例的 vignette。
补充数据可在《生物信息学》在线获取。