Balasubramanian Jeya Balaji, Gopalakrishnan Vanathi
Intelligent Systems Program, School of Computing and Information, University of Pittsburgh, Pittsburgh, PA 15260, United States.
Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15206, United States.
World J Clin Oncol. 2018 Sep 14;9(5):98-109. doi: 10.5306/wjco.v9.i5.98.
To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.
Bayesian rule learning (BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks (BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs. In this paper, we extended the Bayesian score to include informative structure priors, which encodes our prior domain knowledge about the dataset. We call this extension of BRL as BRL. The structure prior has a λ hyperparameter that allows the user to tune the degree of incorporation of the prior knowledge in the model learning process. We studied the effect of λ on model learning using a simulated dataset and a real-world lung cancer prognostic biomarker dataset, by measuring the degree of incorporation of our specified prior knowledge. We also monitored its effect on the model predictive performance. Finally, we compared BRL to other state-of-the-art classifiers commonly used in biomedicine.
We evaluated the degree of incorporation of prior knowledge into BRL, with simulated data by measuring the Graph Edit Distance between the true data-generating model and the model learned by BRL. We specified the true model using informative structure priors. We observed that by increasing the value of λ we were able to increase the influence of the specified structure priors on model learning. A large value of λ of BRL caused it to return the true model. This also led to a gain in predictive performance measured by area under the receiver operator characteristic curve (AUC). We then obtained a publicly available real-world lung cancer prognostic biomarker dataset and specified a known biomarker from literature [the epidermal growth factor receptor () gene]. We again observed that larger values of λ led to an increased incorporation of EGFR into the final BRL model. This relevant background knowledge also led to a gain in AUC.
BRL enables tunable structure priors to be incorporated during Bayesian classification rule learning that integrates data and knowledge as demonstrated using lung cancer biomarker data.
开发一个框架,将背景领域知识纳入生物医学知识发现的分类规则学习中。
贝叶斯规则学习(BRL)是一种基于规则的分类器,它在贝叶斯信念网络(BN)空间上使用贪婪最佳优先搜索来找到最优的BN以解释输入数据集,然后从该BN中推断分类规则。BRL使用贝叶斯分数来评估BN的质量。在本文中,我们扩展了贝叶斯分数以纳入信息结构先验,其对我们关于数据集的先验领域知识进行编码。我们将BRL的这种扩展称为BRL。结构先验有一个λ超参数,允许用户在模型学习过程中调整先验知识的纳入程度。我们通过测量指定先验知识的纳入程度,使用模拟数据集和真实世界的肺癌预后生物标志物数据集研究了λ对模型学习的影响。我们还监测了其对模型预测性能的影响。最后,我们将BRL与生物医学中常用的其他最新分类器进行了比较。
我们通过测量真实数据生成模型与BRL学习的模型之间的图编辑距离,用模拟数据评估了先验知识纳入BRL的程度。我们使用信息结构先验指定了真实模型。我们观察到,通过增加λ的值,我们能够增加指定结构先验对模型学习的影响。BRL的大λ值使其返回真实模型。这也导致在通过接受者操作特征曲线(AUC)下面积测量的预测性能方面有所提高。然后,我们获得了一个公开可用的真实世界肺癌预后生物标志物数据集,并从文献中指定了一个已知的生物标志物[表皮生长因子受体()基因]。我们再次观察到,较大的λ值导致EGFR更多地纳入最终的BRL模型。这种相关的背景知识也导致AUC增加。
如使用肺癌生物标志物数据所证明的,BRL能够在贝叶斯分类规则学习期间纳入可调整的结构先验,从而整合数据和知识。