Chen Qiongyu, Li Guoliang, Leong Tze-Yun, Heng Chew-Kiat
Medical Computing Laboratory, School of Computing, National University of Singapore, 3 Science Drive 2, Singapore 117543.
Stud Health Technol Inform. 2007;129(Pt 2):1219-24.
Coronary artery disease (CAD) is a main cause of death in the world. Finding cost-effective methods to predict CAD is a major challenge in public health. In this paper, we investigate the combined effects of genetic polymorphisms and non-genetic factors on predicting the risk of CAD by applying well known classification methods, such as Bayesian networks, naïve Bayes, support vector machine, k-nearest neighbor, neural networks and decision trees. Our experiments show that all these classifiers are comparable in terms of accuracy, while Bayesian networks have the additional advantage of being able to provide insights into the relationships among the variables. We observe that the learned Bayesian Networks identify many important dependency relationships among genetic variables, which can be verified with domain knowledge. Conforming to current domain understanding, our results indicate that related diseases (e.g., diabetes and hypertension), age and smoking status are the most important factors for CAD prediction, while the genetic polymorphisms entail more complicated influences.
冠状动脉疾病(CAD)是全球主要的死亡原因之一。寻找具有成本效益的方法来预测CAD是公共卫生领域的一项重大挑战。在本文中,我们通过应用著名的分类方法,如贝叶斯网络、朴素贝叶斯、支持向量机、k近邻、神经网络和决策树,研究基因多态性和非基因因素对预测CAD风险的综合影响。我们的实验表明,所有这些分类器在准确性方面具有可比性,而贝叶斯网络还具有能够深入了解变量之间关系的额外优势。我们观察到,学习到的贝叶斯网络识别出了基因变量之间的许多重要依赖关系,这些关系可以通过领域知识得到验证。符合当前领域的理解,我们的结果表明,相关疾病(如糖尿病和高血压)、年龄和吸烟状况是CAD预测的最重要因素,而基因多态性的影响更为复杂。