Department of Library and Information Science, Yonsei University, 50, Yonsei-ro, Seodaemun-gu, Seoul, Republic of Korea.
BMC Bioinformatics. 2018 Jun 13;19(Suppl 8):206. doi: 10.1186/s12859-018-2200-8.
Systems biology is an important field for understanding whole biological mechanisms composed of interactions between biological components. One approach for understanding complex and diverse mechanisms is to analyze biological pathways. However, because these pathways consist of important interactions and information on these interactions is disseminated in a large number of biomedical reports, text-mining techniques are essential for extracting these relationships automatically.
In this study, we applied node2vec, an algorithmic framework for feature learning in networks, for relationship extraction. To this end, we extracted genes from paper abstracts using pkde4j, a text-mining tool for detecting entities and relationships. Using the extracted genes, a co-occurrence network was constructed and node2vec was used with the network to generate a latent representation. To demonstrate the efficacy of node2vec in extracting relationships between genes, performance was evaluated for gene-gene interactions involved in a type 2 diabetes pathway. Moreover, we compared the results of node2vec to those of baseline methods such as co-occurrence and DeepWalk.
Node2vec outperformed existing methods in detecting relationships in the type 2 diabetes pathway, demonstrating that this method is appropriate for capturing the relatedness between pairs of biological entities involved in biological pathways. The results demonstrated that node2vec is useful for automatic pathway construction.
系统生物学是理解由生物成分相互作用组成的整体生物机制的重要领域。理解复杂多样的机制的一种方法是分析生物途径。然而,由于这些途径包含重要的相互作用,并且这些相互作用的信息在大量的生物医学报告中传播,因此文本挖掘技术对于自动提取这些关系至关重要。
在这项研究中,我们应用了 node2vec,这是一种用于网络特征学习的算法框架,用于关系提取。为此,我们使用 pkde4j(一种用于检测实体和关系的文本挖掘工具)从论文摘要中提取基因。使用提取的基因,构建了一个共现网络,并使用网络中的 node2vec 生成潜在表示。为了证明 node2vec 在提取基因之间关系方面的功效,我们评估了涉及 2 型糖尿病途径的基因-基因相互作用的性能。此外,我们将 node2vec 的结果与 co-occurrence 和 DeepWalk 等基线方法的结果进行了比较。
Node2vec 在检测 2 型糖尿病途径中的关系方面优于现有方法,表明该方法适用于捕获生物途径中涉及的生物实体对之间的相关性。结果表明,node2vec 可用于自动构建途径。