Youn Jason, Naravane Tarini, Tagkopoulos Ilias
Department of Computer Science, University of California at Davis, Davis, CA, United States.
Genome Center, University of California at Davis, Davis, CA, United States.
Front Artif Intell. 2020 Nov 26;3:584784. doi: 10.3389/frai.2020.584784. eCollection 2020.
Food ontologies require significant effort to create and maintain as they involve manual and time-consuming tasks, often with limited alignment to the underlying food science knowledge. We propose a semi-supervised framework for the automated ontology population from an existing ontology scaffold by using word embeddings. Having applied this on the domain of food and subsequent evaluation against an expert-curated ontology, FoodOn, we observe that the food word embeddings capture the latent relationships and characteristics of foods. The resulting ontology, which utilizes word embeddings trained from the Wikipedia corpus, has an improvement of 89.7% in precision when compared to the expert-curated ontology FoodOn (0.34 vs. 0.18, respectively, value = 2.6 × 10), and it has a 43.6% shorter path distance (hops) between predicted and actual food instances (2.91 vs. 5.16, respectively, value = 4.7 × 10) when compared to other methods. This work demonstrates how high-dimensional representations of food can be used to populate ontologies and paves the way for learning ontologies that integrate contextual information from a variety of sources and types.
食品本体需要付出巨大努力来创建和维护,因为它们涉及人工且耗时的任务,并且通常与基础食品科学知识的一致性有限。我们提出了一个半监督框架,用于通过使用词嵌入从现有的本体框架自动填充本体。在食品领域应用此方法并针对专家策划的本体FoodOn进行后续评估后,我们观察到食品词嵌入捕捉到了食品的潜在关系和特征。与专家策划的本体FoodOn相比,利用从维基百科语料库训练的词嵌入生成的本体在精度上提高了89.7%(分别为0.34对0.18, 值 = 2.6 × 10),并且与其他方法相比,在预测食品实例和实际食品实例之间的路径距离(跳数)缩短了43.6%(分别为2.91对5.16, 值 = 4.7 × 10)。这项工作展示了如何使用食品的高维表示来填充本体,并为学习整合来自各种来源和类型的上下文信息的本体铺平了道路。