Department of Library and Information Science, Yonsei University, Seoul 03722, Korea.
Institute of Convergence, Yonsei University, Seoul 03722, Korea.
Genes (Basel). 2019 Feb 19;10(2):159. doi: 10.3390/genes10020159.
Although there are many studies of drugs and their side effects, the underlying mechanisms of these side effects are not well understood. It is also difficult to understand the specific pathways between drugs and side effects.
The present study seeks to construct putative paths between drugs and their side effects by applying text-mining techniques to free text of biomedical studies, and to develop ranking metrics that could identify the most-likely paths.
We extracted three types of relationships-drug-protein, protein-protein, and protein⁻side effect-from biomedical texts by using text mining and predefined relation-extraction rules. Based on the extracted relationships, we constructed whole drug-protein⁻side effect paths. For each path, we calculated its ranking score by a new ranking function that combines corpus- and ontology-based semantic similarity as well as co-occurrence frequency.
We extracted 13 plausible biomedical paths connecting drugs and their side effects from cancer-related abstracts in the PubMed database. The top 20 paths were examined, and the proposed ranking function outperformed the other methods tested, including co-occurrence, COALS, and UMLS by P@5-P@20. In addition, we confirmed that the paths are novel hypotheses that are worth investigating further.
The risk of side effects has been an important issue for the US Food and Drug Administration (FDA). However, the causes and mechanisms of such side effects have not been fully elucidated. This study extends previous research on understanding drug side effects by using various techniques such as Named Entity Recognition (NER), Relation Extraction (RE), and semantic similarity.
It is not easy to reveal the biomedical mechanisms of side effects due to a huge number of possible paths. However, we automatically generated predictable paths using the proposed approach, which could provide meaningful information to biomedical researchers to generate plausible hypotheses for the understanding of such mechanisms.
尽管有许多关于药物及其副作用的研究,但这些副作用的潜在机制仍未得到很好的理解。也很难理解药物和副作用之间的具体途径。
本研究通过应用文本挖掘技术从生物医学研究的自由文本中提取药物与其副作用之间的可能路径,并开发排名指标来识别最有可能的路径。
我们通过文本挖掘和预定义的关系提取规则从生物医学文本中提取了药物-蛋白、蛋白-蛋白和蛋白-副作用三种关系。基于提取的关系,我们构建了完整的药物-蛋白-副作用路径。对于每条路径,我们通过一种新的排名函数计算其排名得分,该函数结合了语料库和本体论的语义相似性以及共现频率。
我们从 PubMed 数据库中癌症相关摘要中提取了 13 条连接药物及其副作用的合理生物医学路径。对前 20 条路径进行了检查,提出的排名函数优于其他测试方法,包括共现、COALS 和 UMLS 在 P@5-P@20 中的表现。此外,我们还证实这些路径是值得进一步研究的新假设。
副作用的风险一直是美国食品和药物管理局(FDA)的一个重要问题。然而,这些副作用的原因和机制尚未完全阐明。本研究通过使用命名实体识别(NER)、关系提取(RE)和语义相似性等各种技术,扩展了以前关于理解药物副作用的研究。
由于可能的路径数量众多,揭示副作用的生物医学机制并不容易。然而,我们使用提出的方法自动生成可预测的路径,这可以为生物医学研究人员提供有意义的信息,以生成对这些机制的理解的合理假设。