Wu Jain-Shing, Kao E-Fong, Lee Chung-Nan
Department of Computer Science and Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan.
Department of Medical Imaging and Radiological Sciences, Kaohsiung Medical University, Kaohsiung, Taiwan.
PLoS One. 2014 Jun 10;9(6):e98826. doi: 10.1371/journal.pone.0098826. eCollection 2014.
Microarrays based on gene expression profiles (GEPs) can be tailored specifically for a variety of topics to provide a precise and efficient means with which to discover hidden information. This study proposes a novel means of employing existing GEPs to reveal hidden relationships among diseases, genes, and drugs within a rich biomedical database, PubMed. Unlike the co-occurrence method, which considers only the appearance of keywords, the proposed method also takes into account negative relationships and non-relationships among keywords, the importance of which has been demonstrated in previous studies. Three scenarios were conducted to verify the efficacy of the proposed method. In Scenario 1, disease and drug GEPs (disease: lymphoma cancer, lymph node cancer, and drug: cyclophosphamide) were used to obtain lists of disease- and drug-related genes. Fifteen hidden connections were identified between the diseases and the drug. In Scenario 2, we adopted different diseases and drug GEPs (disease: AML-ALL dataset and drug: Gefitinib) to obtain lists of important diseases and drug-related genes. In this case, ten hidden connections were identified. In Scenario 3, we obtained a list of disease-related genes from the disease-related GEP (liver cancer) and the drug (Capecitabine) on the PharmGKB website, resulting in twenty-two hidden connections. Experimental results demonstrate the efficacy of the proposed method in uncovering hidden connections among diseases, genes, and drugs. Following implementation of the weight function in the proposed method, a large number of the documents obtained in each of the scenarios were judged to be related: 834 of 4028 documents, 789 of 1216 documents, and 1928 of 3791 documents in Scenarios 1, 2, and 3, respectively. The negative-term filtering scheme also uncovered a large number of negative relationships as well as non-relationships among these connections: 97 of 834, 38 of 789, and 202 of 1928 in Scenarios 1, 2, and 3, respectively.
基于基因表达谱(GEP)的微阵列可以针对各种主题进行专门定制,以提供一种精确且高效的手段来发现隐藏信息。本研究提出了一种利用现有GEP在丰富的生物医学数据库PubMed中揭示疾病、基因和药物之间隐藏关系的新方法。与仅考虑关键词出现情况的共现方法不同,该方法还考虑了关键词之间的负相关关系和非相关关系,先前的研究已证明了其重要性。进行了三种场景验证该方法的有效性。在场景1中,使用疾病和药物GEP(疾病:淋巴瘤、淋巴结癌,药物:环磷酰胺)来获取疾病和药物相关基因列表。在疾病和药物之间识别出15个隐藏联系。在场景2中,采用不同的疾病和药物GEP(疾病:急性髓细胞白血病-急性淋巴细胞白血病数据集,药物:吉非替尼)来获取重要疾病和药物相关基因列表。在这种情况下,识别出10个隐藏联系。在场景3中,我们从疾病相关GEP(肝癌)和PharmGKB网站上的药物(卡培他滨)中获取疾病相关基因列表,得到22个隐藏联系。实验结果证明了该方法在揭示疾病、基因和药物之间隐藏联系方面的有效性。在所提出的方法中实施权重函数后,在每个场景中获得的大量文档被判定为相关:场景1中4028篇文档中的834篇,场景2中1216篇文档中的789篇,场景3中3791篇文档中的1928篇。否定词过滤方案还揭示了这些联系之间的大量负相关关系以及非相关关系:场景1中834篇中的97篇,场景2中789篇中的38篇,场景3中1928篇中的202篇。