Faculty of Information Technology, University of Science, Ho Chi Minh City, Vietnam; Vietnam National University, Ho Chi Minh City, Vietnam.
Faculty of Information Technology, University of Science, Ho Chi Minh City, Vietnam; Vietnam National University, Ho Chi Minh City, Vietnam.
J Biomed Inform. 2023 Sep;145:104460. doi: 10.1016/j.jbi.2023.104460. Epub 2023 Aug 1.
While a large number of knowledge graphs have previously been developed by automatically extracting and structuring knowledge from literature, there is currently no such knowledge graph that encodes relationships between food, biochemicals and mental illnesses, even though a large amount of knowledge about these relationships is available in the form of unstructured text in biomedical literature articles. To address this limitation, this article describes the development of GENA - (Graph of mEntal-health and Nutrition Association), a knowledge graph that represents relations between nutrition and mental health, extracted from biomedical abstracts. GENA is constructed from PubMed abstracts that contain keywords relating to chemicals, food, and health. A hybrid named entity recognition (NER) model is firstly applied to these abstracts to identify various entities of interest. Subsequently, a deep syntax-based relation extraction model is used to detect binary relations between the identified entities. Finally, the resulting relations are used to populate the GENA knowledge graph, whose relationships can be accessed in an intuitive and interpretable manner using the Neo4J Database Management System. To evaluate the reliability of GENA, two annotators manually assessed a subset of the extracted relations. The evaluation results show that our methods obtain high precision for the NER task and acceptable precision and relative recall for the relation extraction task. GENA consists of 43,367 relationships that encode information about nutrition and health, of which 94.04% are new relations that are not present in existing ontologies of food and diseases. GENA is constructed based on scientific principles, and has the potential to be used within further applications to contribute towards scientific research within the domain. It is a pioneering knowledge graph in nutrition and mental health, containing a diverse range of relationship types. All of our source code and results are publicly available at https://github.com/ddlinh/gena-db.
虽然之前已经有大量的知识图谱通过自动从文献中提取和构建知识来开发,但目前还没有这样的知识图谱来编码食物、生物化学物质和精神疾病之间的关系,尽管在生物医学文献文章中以非结构化文本的形式存在大量关于这些关系的知识。为了解决这个限制,本文描述了 GENA 的开发 - (心理健康和营养协会图),这是一个从生物医学摘要中提取的代表营养与心理健康之间关系的知识图谱。GENA 是由包含与化学物质、食物和健康相关的关键字的 PubMed 摘要构建而成。首先应用混合命名实体识别 (NER) 模型来识别这些摘要中的各种感兴趣的实体。然后,使用基于深度语法的关系提取模型来检测已识别实体之间的二元关系。最后,使用生成的关系来填充 GENA 知识图谱,其关系可以使用 Neo4J 数据库管理系统以直观和可解释的方式访问。为了评估 GENA 的可靠性,两名注释员手动评估了提取关系的一个子集。评估结果表明,我们的方法在 NER 任务中获得了很高的精度,在关系提取任务中获得了可接受的精度和相对召回率。GENA 包含 43367 种关系,这些关系编码了有关营养和健康的信息,其中 94.04%是不存在于现有食物和疾病本体中的新关系。GENA 是基于科学原理构建的,有可能在进一步的应用中使用,为该领域的科学研究做出贡献。它是营养和心理健康领域的一个开创性知识图谱,包含各种关系类型。我们的所有源代码和结果都可在 https://github.com/ddlinh/gena-db 上获得。