Gharibi Mohamed, Zachariah Arun, Rao Praveen
Department of Computer Science and Electrical Engineering, University of Missouri-Kansas City, Kansas City, MO, United States.
Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO, United States.
Front Big Data. 2020 Apr 29;3:12. doi: 10.3389/fdata.2020.00012. eCollection 2020.
While there exist a plethora of datasets on the Internet related to Food, Energy, and Water (FEW), there is a real lack of reliable methods and tools that can consume these resources. This hinders the development of novel decision-making applications utilizing knowledge graphs. In this paper, we introduce a novel software tool, called FoodKG, that enriches FEW knowledge graphs using advanced machine learning techniques. Our overarching goal is to improve decision-making and knowledge discovery as well as to provide improved search results for data scientists in the FEW domains. Given an input knowledge graph (constructed on raw FEW datasets), FoodKG enriches it with semantically related triples, relations, and images based on the original dataset terms and classes. FoodKG employs an existing graph embedding technique trained on a controlled vocabulary called AGROVOC, which is published by the Food and Agriculture Organization of the United Nations. AGROVOC includes terms and classes in the agriculture and food domains. As a result, FoodKG can enhance knowledge graphs with semantic similarity scores and relations between different classes, classify the existing entities, and allow FEW experts and researchers to use scientific terms for describing FEW concepts. The resulting model obtained after training on AGROVOC was evaluated against the state-of-the-art word embedding and knowledge graph embedding models that were trained on the same dataset. We observed that this model outperformed its competitors based on the Spearman Correlation Coefficient score.
虽然互联网上存在大量与食物、能源和水(FEW)相关的数据集,但真正缺乏能够利用这些资源的可靠方法和工具。这阻碍了利用知识图谱的新型决策应用程序的开发。在本文中,我们介绍了一种名为FoodKG的新型软件工具,它使用先进的机器学习技术丰富FEW知识图谱。我们的总体目标是改善决策制定和知识发现,并为FEW领域的数据科学家提供更好的搜索结果。给定一个输入知识图谱(基于原始FEW数据集构建),FoodKG会根据原始数据集的术语和类别,用语义相关的三元组、关系和图像对其进行丰富。FoodKG采用一种现有的图嵌入技术,该技术是在联合国粮食及农业组织发布的一个名为AGROVOC的控制词汇表上训练的。AGROVOC包括农业和食品领域的术语和类别。因此,FoodKG可以通过语义相似性分数和不同类别之间的关系来增强知识图谱,对现有实体进行分类,并允许FEW专家和研究人员使用科学术语来描述FEW概念。在AGROVOC上训练后得到的模型,与在同一数据集上训练的最先进的词嵌入和知识图谱嵌入模型进行了评估比较。我们观察到,基于斯皮尔曼相关系数得分,该模型优于其竞争对手。