Antal Peter, Fannes Geert, Timmerman Dirk, Moreau Yves, De Moor Bart
Department of Electrical Engineering, ESAT/SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium.
Artif Intell Med. 2004 Mar;30(3):257-81. doi: 10.1016/j.artmed.2003.11.007.
Thanks to its increasing availability, electronic literature has become a potential source of information for the development of complex Bayesian networks (BN), when human expertise is missing or data is scarce or contains much noise. This opportunity raises the question of how to integrate information from free-text resources with statistical data in learning Bayesian networks. Firstly, we report on the collection of prior information resources in the ovarian cancer domain, which includes "kernel" annotations of the domain variables. We introduce methods based on the annotations and literature to derive informative pairwise dependency measures, which are derived from the statistical cooccurrence of the names of the variables, from the similarity of the "kernel" descriptions of the variables and from a combined method. We perform wide-scale evaluation of these text-based dependency scores against an expert reference and against data scores (the mutual information (MI) and a Bayesian score). Next, we transform the text-based dependency measures into informative text-based priors for Bayesian network structures. Finally, we report the benefit of such informative text-based priors on the performance of a Bayesian network for the classification of ovarian tumors from clinical data.
由于电子文献的可得性日益提高,当缺乏人类专业知识、数据稀缺或包含大量噪声时,电子文献已成为开发复杂贝叶斯网络(BN)的潜在信息来源。这一机遇引发了一个问题,即如何在学习贝叶斯网络时将来自自由文本资源的信息与统计数据相结合。首先,我们报告了卵巢癌领域先验信息资源的收集情况,其中包括领域变量的“核心”注释。我们介绍了基于注释和文献的方法,以得出信息丰富的成对依赖度量,这些度量分别来自变量名称的统计共现、变量“核心”描述的相似性以及一种组合方法。我们针对专家参考和数据分数(互信息(MI)和贝叶斯分数)对这些基于文本的依赖分数进行了大规模评估。接下来,我们将基于文本的依赖度量转换为用于贝叶斯网络结构的信息丰富的基于文本的先验。最后,我们报告了这种信息丰富的基于文本的先验对基于临床数据的卵巢肿瘤分类贝叶斯网络性能的益处。