Gwangju institute of science and technology, School of Electrical Engineering and Computer Science, Gwangju, 61005, Korea.
1 Fusionopolis Way, #21-01 Connexis (South Tower), 138632, Singapore.
Sci Rep. 2017 Jan 5;7:40154. doi: 10.1038/srep40154.
Diseases are developed by abnormal behavior of genes in biological events such as gene regulation, mutation, phosphorylation, and epigenetics and post-translational modification. Many studies of text mining attempted to identify the relationship between gene and disease by mining the literature, but they did not consider the biological events in which genes show abnormal behaviour in response to diseases. In this study, we propose to identify disease-related genes that are involved in the development of disease through biological events from Medline abstracts. We identified associations between 13,054 genes and 4,494 disease types, which cover more disease-related genes than manually curated databases for all disease types (e.g., Online Mendelian Inheritance in Man) and also than those for specific diseases (e.g., Alzheimer's disease and hypertension). We show that the text mining findings are reliable, as per the PubMed scale, in that the disease-disease relationships inferred from the literature-wide findings are similar to those inferred from manually curated databases in a well-known study. In addition, literature-wide distribution of biological events across disease types reveals different characteristics of disease types.
疾病是由基因在基因调控、突变、磷酸化、表观遗传和翻译后修饰等生物事件中的异常行为引起的。许多文本挖掘研究试图通过挖掘文献来识别基因与疾病之间的关系,但它们没有考虑到基因在应对疾病时表现出异常行为的生物事件。在这项研究中,我们提出通过从 Medline 摘要中识别与生物事件相关的疾病相关基因来识别疾病相关基因。我们确定了 13054 个基因和 4494 种疾病类型之间的关联,这些关联涵盖了更多与疾病相关的基因,比所有疾病类型(例如在线孟德尔遗传在人)的手动整理数据库以及特定疾病(例如阿尔茨海默病和高血压)的数据库都要多。我们表明,根据 PubMed 标准,文本挖掘的发现是可靠的,因为从文献广泛发现中推断出的疾病-疾病关系与在一项著名研究中从手动整理数据库中推断出的关系相似。此外,疾病类型中生物事件的文献广泛分布揭示了疾病类型的不同特征。