Zhu Shanfeng, Okuno Yasushi, Tsujimoto Gozoh, Mamitsuka Hiroshi
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan.
Cancer Inform. 2007 Feb 25;2:361-71.
An important issue in current medical science research is to find the genes that are strongly related to an inherited disease. A particular focus is placed on cancer-gene relations, since some types of cancers are inherited. As biomedical databases have grown speedily in recent years, an informatics approach to predict such relations from currently available databases should be developed. Our objective is to find implicit associated cancer-genes from biomedical databases including the literature database. Co-occurrence of biological entities has been shown to be a popular and efficient technique in biomedical text mining. We have applied a new probabilistic model, called mixture aspect model (MAM) [48], to combine different types of co-occurrences of genes and cancer derived from Medline and OMIM (Online Mendelian Inheritance in Man). We trained the probability parameters of MAM using a learning method based on an EM (Expectation and Maximization) algorithm. We examined the performance of MAM by predicting associated cancer gene pairs. Through cross-validation, prediction accuracy was shown to be improved by adding gene-gene co-occurrences from Medline to cancer-gene cooccurrences in OMIM. Further experiments showed that MAM found new cancer-gene relations which are unknown in the literature. Supplementary information can be found at http://www.bic.kyotou.ac.jp/pathway/zhusf/CancerInformatics/Supplemental2006.html.
当前医学科研中的一个重要问题是找到与遗传性疾病密切相关的基因。由于某些类型的癌症是遗传性的,因此特别关注癌症与基因的关系。近年来,随着生物医学数据库的迅速增长,应开发一种信息学方法,以便从现有数据库中预测此类关系。我们的目标是从包括文献数据库在内的生物医学数据库中找到潜在的相关癌症基因。生物实体的共现已被证明是生物医学文本挖掘中一种常用且有效的技术。我们应用了一种新的概率模型,称为混合方面模型(MAM)[48],来结合源自Medline和OMIM(《人类孟德尔遗传在线》)的不同类型的基因与癌症的共现情况。我们使用基于期望最大化(EM)算法的学习方法训练了MAM的概率参数。我们通过预测相关癌症基因对来检验MAM的性能。通过交叉验证,结果表明,将Medline中的基因与基因共现情况添加到OMIM中的癌症与基因共现情况中,预测准确率得到了提高。进一步的实验表明,MAM发现了文献中未知的新的癌症与基因的关系。补充信息可在http://www.bic.kyotou.ac.jp/pathway/zhusf/CancerInformatics/Supplemental2006.html上找到。