Wang Mingyi, Chen Zuozhou, Cloutier Sylvie
Agriculture and Agri-Food Canada, Cereal Research Centre, Winnipeg, MB R3T 2M9, Canada.
Comput Biol Chem. 2007 Oct;31(5-6):361-72. doi: 10.1016/j.compbiolchem.2007.08.005. Epub 2007 Aug 19.
A Bayesian network (BN) is a knowledge representation formalism that has proven to be a promising tool for analyzing gene expression data. Several problems still restrict its successful applications. Typical gene expression databases contain measurements for thousands of genes and no more than several hundred samples, but most existing BNs learning algorithms do not scale more than a few hundred variables. Current methods result in poor quality BNs when applied in such high-dimensional datasets. We propose a hybrid constraint-based scored-searching method that is effective for learning gene networks from DNA microarray data. In the first phase of this method, a novel algorithm is used to generate a skeleton BN based on dependency analysis. Then the resulting BN structure is searched by a scoring metric combined with the knowledge learned from the first phase. Computational tests have shown that the proposed method achieves more accurate results than state-of-the-art methods. This method can also be scaled beyond datasets with several hundreds of variables.
贝叶斯网络(BN)是一种知识表示形式,已被证明是分析基因表达数据的一种很有前景的工具。但仍有几个问题限制了它的成功应用。典型的基因表达数据库包含数千个基因的测量数据和不超过几百个样本,然而大多数现有的贝叶斯网络学习算法处理变量超过几百个时就无法扩展。当前方法应用于这种高维数据集时会导致质量较差的贝叶斯网络。我们提出一种基于混合约束的评分搜索方法,该方法对于从DNA微阵列数据中学习基因网络是有效的。在该方法的第一阶段,使用一种新颖的算法基于依赖性分析生成一个骨架贝叶斯网络。然后通过一种评分指标结合从第一阶段学到的知识来搜索得到的贝叶斯网络结构。计算测试表明,所提出的方法比现有最先进的方法能取得更准确的结果。该方法还可以扩展到处理变量超过几百个的数据集。