Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America; Harvard Medical School, Boston, Massachusetts, United States of America.
Harvard Medical School, Boston, Massachusetts, United States of America; Children's Hospital Informatics Program, Children's Hospital Boston, Boston, Massachusetts, United States of America.
PLoS Comput Biol. 2014 Jun 12;10(6):e1003676. doi: 10.1371/journal.pcbi.1003676. eCollection 2014 Jun.
Bayesian Networks (BN) have been a popular predictive modeling formalism in bioinformatics, but their application in modern genomics has been slowed by an inability to cleanly handle domains with mixed discrete and continuous variables. Existing free BN software packages either discretize continuous variables, which can lead to information loss, or do not include inference routines, which makes prediction with the BN impossible. We present CGBayesNets, a BN package focused around prediction of a clinical phenotype from mixed discrete and continuous variables, which fills these gaps. CGBayesNets implements Bayesian likelihood and inference algorithms for the conditional Gaussian Bayesian network (CGBNs) formalism, one appropriate for predicting an outcome of interest from, e.g., multimodal genomic data. We provide four different network learning algorithms, each making a different tradeoff between computational cost and network likelihood. CGBayesNets provides a full suite of functions for model exploration and verification, including cross validation, bootstrapping, and AUC manipulation. We highlight several results obtained previously with CGBayesNets, including predictive models of wood properties from tree genomics, leukemia subtype classification from mixed genomic data, and robust prediction of intensive care unit mortality outcomes from metabolomic profiles. We also provide detailed example analysis on public metabolomic and gene expression datasets. CGBayesNets is implemented in MATLAB and available as MATLAB source code, under an Open Source license and anonymous download at http://www.cgbayesnets.com.
贝叶斯网络(BN)一直是生物信息学中一种流行的预测建模形式,但由于无法干净地处理混合离散和连续变量的领域,其在现代基因组学中的应用受到了阻碍。现有的免费 BN 软件包要么对连续变量进行离散化,这可能导致信息丢失,要么不包括推理例程,这使得使用 BN 进行预测成为不可能。我们提出了 CGBayesNets,这是一个专注于从混合离散和连续变量预测临床表型的 BN 包,填补了这些空白。CGBayesNets 实现了条件高斯贝叶斯网络(CGBN)形式的贝叶斯似然和推理算法,非常适合从多模态基因组数据等预测感兴趣的结果。我们提供了四种不同的网络学习算法,每种算法在计算成本和网络似然之间都有不同的权衡。CGBayesNets 提供了一套完整的模型探索和验证功能,包括交叉验证、引导和 AUC 操作。我们强调了之前使用 CGBayesNets 获得的几个结果,包括从树木基因组预测木材特性的预测模型、从混合基因组数据分类白血病亚型、以及从代谢组学谱稳健预测重症监护病房死亡率的结果。我们还提供了公共代谢组学和基因表达数据集的详细示例分析。CGBayesNets 是用 MATLAB 编写的,并以 MATLAB 源代码的形式提供,根据开源许可证和匿名下载在 http://www.cgbayesnets.com 上提供。