Department of Ecology and Evolutionary Biology, University of Tennessee, Knoxville, TN, USA.
National Institute for Mathematical and Biological Synthesis.
Bioinformatics. 2018 Jul 15;34(14):2496-2498. doi: 10.1093/bioinformatics/bty138.
AnaCoDa is an R package for estimating biologically relevant parameters of mixture models, such as selection against translation inefficiency, non-sense errors and ribosome pausing time, from genomic and high throughput datasets. AnaCoDa provides an adaptive Bayesian MCMC algorithm, fully implemented in C++ for high performance with an ergonomic R interface to improve usability. AnaCoDa employs a generic object-oriented design to allow users to extend the framework and implement their own models. Current models implemented in AnaCoDa can accurately estimate biologically relevant parameters given either protein coding sequences or ribosome foot-printing data. Optionally, AnaCoDa can utilize additional data sources, such as gene expression measurements, to aid model fitting and parameter estimation. By utilizing a hierarchical object structure, some parameters can vary between sets of genes while others can be shared. Genes may be assigned to clusters or membership may be estimated by AnaCoDa. This flexibility allows users to estimate the same model parameter under different biological conditions and categorize genes into different sets based on shared model properties embedded within the data. AnaCoDa also allows users to generate simulated data which can be used to aid model development and model analysis as well as evaluate model adequacy. Finally, AnaCoDa contains a set of visualization routines and the ability to revisit or re-initiate previous model fitting, providing researchers with a well rounded easy to use framework to analyze genome scale data.
AnaCoDa is freely available under the Mozilla Public License 2.0 on CRAN (https://cran.r-project.org/web/packages/AnaCoDa/).
AnaCoDa 是一个用于从基因组和高通量数据集估计混合物模型的生物学相关参数的 R 包,例如对翻译效率低下、无意义错误和核糖体暂停时间的选择。AnaCoDa 提供了一种自适应贝叶斯 MCMC 算法,完全用 C++ 实现,具有高性能,并且具有舒适的 R 接口,可提高易用性。AnaCoDa 采用通用的面向对象设计,允许用户扩展框架并实现自己的模型。当前在 AnaCoDa 中实现的模型可以在给定蛋白质编码序列或核糖体足迹数据的情况下准确估计生物学相关参数。可选地,AnaCoDa 可以利用其他数据源,如基因表达测量值,来辅助模型拟合和参数估计。通过利用分层对象结构,一些参数可以在基因集之间变化,而其他参数可以共享。可以将基因分配到聚类中,或者由 AnaCoDa 估计其成员身份。这种灵活性允许用户在不同的生物学条件下估计相同的模型参数,并根据数据中嵌入的共享模型属性将基因分类到不同的集合中。AnaCoDa 还允许用户生成模拟数据,可用于辅助模型开发和模型分析以及评估模型充分性。最后,AnaCoDa 包含一组可视化例程和重新访问或重新初始化先前模型拟合的能力,为研究人员提供了一个全面且易于使用的框架,用于分析基因组规模的数据。
AnaCoDa 根据 Mozilla Public License 2.0 免费提供,可在 CRAN(https://cran.r-project.org/web/packages/AnaCoDa/)上获得。