Chen Bolin, Wang Jianxin, Li Min, Wu Fang-Xiang
BMC Med Genomics. 2014;7 Suppl 2(Suppl 2):S2. doi: 10.1186/1755-8794-7-S2-S2. Epub 2014 Oct 22.
Now multiple types of data are available for identifying disease genes. Those data include gene-disease associations, disease phenotype similarities, protein-protein interactions, pathways, gene expression profiles, etc.. It is believed that integrating different kinds of biological data is an effective method to identify disease genes.
In this paper, we propose a multiple data integration method based on the theory of Markov random field (MRF) and the method of Bayesian analysis for identifying human disease genes. The proposed method is not only flexible in easily incorporating different kinds of data, but also reliable in predicting candidate disease genes.
Numerical experiments are carried out by integrating known gene-disease associations, protein complexes, protein-protein interactions, pathways and gene expression profiles. Predictions are evaluated by the leave-one-out method. The proposed method achieves an AUC score of 0.743 when integrating all those biological data in our experiments.
目前有多种类型的数据可用于识别疾病基因。这些数据包括基因与疾病的关联、疾病表型相似性、蛋白质-蛋白质相互作用、信号通路、基因表达谱等。人们认为整合不同类型的生物学数据是识别疾病基因的有效方法。
在本文中,我们提出了一种基于马尔可夫随机场(MRF)理论和贝叶斯分析方法的多数据整合方法,用于识别人类疾病基因。所提出的方法不仅在轻松整合不同类型的数据方面具有灵活性,而且在预测候选疾病基因方面也具有可靠性。
通过整合已知的基因与疾病关联、蛋白质复合物、蛋白质-蛋白质相互作用、信号通路和基因表达谱进行了数值实验。通过留一法对预测结果进行评估。在我们的实验中,当整合所有这些生物学数据时,所提出的方法获得了0.743的AUC分数。