Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
J Biomed Inform. 2011 Oct;44(5):859-68. doi: 10.1016/j.jbi.2011.05.004. Epub 2011 May 27.
In this paper, we propose a novel method that combines PubMed knowledge and Electronic Health Records to develop a weighted Bayesian Network Inference (BNI) model for pancreatic cancer prediction. We selected 20 common risk factors associated with pancreatic cancer and used PubMed knowledge to weigh the risk factors. A keyword-based algorithm was developed to extract and classify PubMed abstracts into three categories that represented positive, negative, or neutral associations between each risk factor and pancreatic cancer. Then we designed a weighted BNI model by adding the normalized weights into a conventional BNI model. We used this model to extract the EHR values for patients with or without pancreatic cancer, which then enabled us to calculate the prior probabilities for the 20 risk factors in the BNI. The software iDiagnosis was designed to use this weighted BNI model for predicting pancreatic cancer. In an evaluation using a case-control dataset, the weighted BNI model significantly outperformed the conventional BNI and two other classifiers (k-Nearest Neighbor and Support Vector Machine). We conclude that the weighted BNI using PubMed knowledge and EHR data shows remarkable accuracy improvement over existing representative methods for pancreatic cancer prediction.
在本文中,我们提出了一种新的方法,将 PubMed 知识和电子健康记录相结合,开发了一种加权贝叶斯网络推理(BNI)模型,用于预测胰腺癌。我们选择了 20 个与胰腺癌相关的常见风险因素,并使用 PubMed 知识对这些风险因素进行加权。开发了一种基于关键字的算法,用于将 PubMed 摘要提取并分类为三个类别,分别代表每个风险因素与胰腺癌之间的正、负或中性关联。然后,我们通过将归一化权重添加到常规 BNI 模型中,设计了一个加权 BNI 模型。我们使用该模型提取有或没有胰腺癌的患者的 EHR 值,然后计算 BNI 中 20 个风险因素的先验概率。软件 iDiagnosis 用于使用这个加权 BNI 模型来预测胰腺癌。在使用病例对照数据集进行的评估中,加权 BNI 模型显著优于传统的 BNI 和另外两个分类器(k-最近邻和支持向量机)。我们得出结论,使用 PubMed 知识和 EHR 数据的加权 BNI 在预测胰腺癌方面的准确性明显优于现有的代表性方法。