Szostak Justyna, Ansari Sam, Madan Sumit, Fluck Juliane, Talikka Marja, Iskandar Anita, De Leon Hector, Hofmann-Apitius Martin, Peitsch Manuel C, Hoeng Julia
Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland and.
Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland and
Database (Oxford). 2015 Jun 17;2015:bav057. doi: 10.1093/database/bav057.
Capture and representation of scientific knowledge in a structured format are essential to improve the understanding of biological mechanisms involved in complex diseases. Biological knowledge and knowledge about standardized terminologies are difficult to capture from literature in a usable form. A semi-automated knowledge extraction workflow is presented that was developed to allow users to extract causal and correlative relationships from scientific literature and to transcribe them into the computable and human readable Biological Expression Language (BEL). The workflow combines state-of-the-art linguistic tools for recognition of various entities and extraction of knowledge from literature sources. Unlike most other approaches, the workflow outputs the results to a curation interface for manual curation and converts them into BEL documents that can be compiled to form biological networks. We developed a new semi-automated knowledge extraction workflow that was designed to capture and organize scientific knowledge and reduce the required curation skills and effort for this task. The workflow was used to build a network that represents the cellular and molecular mechanisms implicated in atherosclerotic plaque destabilization in an apolipoprotein-E-deficient (ApoE(-/-)) mouse model. The network was generated using knowledge extracted from the primary literature. The resultant atherosclerotic plaque destabilization network contains 304 nodes and 743 edges supported by 33 PubMed referenced articles. A comparison between the semi-automated and conventional curation processes showed similar results, but significantly reduced curation effort for the semi-automated process. Creating structured knowledge from unstructured text is an important step for the mechanistic interpretation and reusability of knowledge. Our new semi-automated knowledge extraction workflow reduced the curation skills and effort required to capture and organize scientific knowledge. The atherosclerotic plaque destabilization network that was generated is a causal network model for vascular disease demonstrating the usefulness of the workflow for knowledge extraction and construction of mechanistically meaningful biological networks.
以结构化格式捕获和呈现科学知识对于增进对复杂疾病所涉及生物机制的理解至关重要。生物知识以及关于标准化术语的知识难以以可用形式从文献中获取。本文提出了一种半自动知识提取工作流程,其开发目的是让用户能够从科学文献中提取因果关系和相关关系,并将其转录为可计算且人类可读的生物表达语言(BEL)。该工作流程结合了用于识别各种实体和从文献来源提取知识的先进语言工具。与大多数其他方法不同,该工作流程将结果输出到一个用于人工编目的界面,并将其转换为可编译以形成生物网络的BEL文档。我们开发了一种新的半自动知识提取工作流程,旨在捕获和组织科学知识,并减少此任务所需的编目技能和工作量。该工作流程用于构建一个网络,该网络代表载脂蛋白E缺陷(ApoE(-/-))小鼠模型中动脉粥样硬化斑块不稳定所涉及的细胞和分子机制。该网络是使用从原始文献中提取的知识生成的。所得的动脉粥样硬化斑块不稳定网络包含304个节点和743条边,由33篇PubMed引用文章支持。半自动编目过程与传统编目过程的比较显示了相似的结果,但半自动过程的编目工作量显著减少。从非结构化文本创建结构化知识是知识的机械解释和可重用性的重要一步。我们新的半自动知识提取工作流程减少了捕获和组织科学知识所需的编目技能和工作量。所生成的动脉粥样硬化斑块不稳定网络是一种血管疾病的因果网络模型,证明了该工作流程在知识提取和构建具有机械意义的生物网络方面的有用性。