将非单调逻辑推理和归纳学习与深度学习相结合用于可解释视觉问答

Integrating Non-monotonic Logical Reasoning and Inductive Learning With Deep Learning for Explainable Visual Question Answering.

作者信息

Riley Heather, Sridharan Mohan

机构信息

Electrical and Computer Engineering, The University of Auckland, Auckland, New Zealand.

Intelligent Robotics Lab, School of Computer Science, University of Birmingham, Birmingham, United Kingdom.

出版信息

Front Robot AI. 2019 Dec 11;6:125. doi: 10.3389/frobt.2019.00125. eCollection 2019.

DOI:10.3389/frobt.2019.00125

PMID:33501140

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7805953/

Abstract

State of the art algorithms for many pattern recognition problems rely on data-driven deep network models. Training these models requires a large labeled dataset and considerable computational resources. Also, it is difficult to understand the working of these learned models, limiting their use in some critical applications. Toward addressing these limitations, our architecture draws inspiration from research in cognitive systems, and integrates the principles of commonsense logical reasoning, inductive learning, and deep learning. As a motivating example of a task that requires explainable reasoning and learning, we consider Visual Question Answering in which, given an image of a scene, the objective is to answer explanatory questions about objects in the scene, their relationships, or the outcome of executing actions on these objects. In this context, our architecture uses deep networks for extracting features from images and for generating answers to queries. Between these deep networks, it embeds components for non-monotonic logical reasoning with incomplete commonsense domain knowledge, and for decision tree induction. It also incrementally learns and reasons with previously unknown constraints governing the domain's states. We evaluated the architecture in the context of datasets of simulated and real-world images, and a simulated robot computing, executing, and providing explanatory descriptions of plans and experiences during plan execution. Experimental results indicate that in comparison with an "end to end" architecture of deep networks, our architecture provides better accuracy on classification problems when the training dataset is small, comparable accuracy with larger datasets, and more accurate answers to explanatory questions. Furthermore, incremental acquisition of previously unknown constraints improves the ability to answer explanatory questions, and extending non-monotonic logical reasoning to support planning and diagnostics improves the reliability and efficiency of computing and executing plans on a simulated robot.

摘要

许多模式识别问题的先进算法依赖于数据驱动的深度网络模型。训练这些模型需要大量带标签的数据集和可观的计算资源。此外，很难理解这些学习到的模型的工作方式，这限制了它们在一些关键应用中的使用。为了解决这些限制，我们的架构从认知系统的研究中汲取灵感，并整合了常识逻辑推理、归纳学习和深度学习的原理。作为一个需要可解释推理和学习的任务的激励示例，我们考虑视觉问答，即在给定场景图像的情况下，目标是回答关于场景中对象、它们的关系或对这些对象执行动作的结果的解释性问题。在这种情况下，我们的架构使用深度网络从图像中提取特征并生成查询答案。在这些深度网络之间，它嵌入了用于基于不完整常识领域知识进行非单调逻辑推理和决策树归纳的组件。它还会根据控制领域状态的先前未知约束进行增量学习和推理。我们在模拟和真实世界图像数据集以及模拟机器人计算、执行并在计划执行期间提供计划和经验的解释性描述的背景下评估了该架构。实验结果表明，与深度网络的“端到端”架构相比，当训练数据集较小时，我们的架构在分类问题上提供了更高的准确率，在较大数据集上具有可比的准确率，并且对解释性问题提供了更准确的答案。此外，对先前未知约束的增量获取提高了回答解释性问题的能力，将非单调逻辑推理扩展到支持规划和诊断提高了在模拟机器人上计算和执行计划的可靠性和效率。