Xiong Kai, Ding Xiao, Cao Yixin, Zhao Yang, Liu Ting, Qin Bing
Research Center for Social Computing and Interactive Robotics, Harbin Institute of Technology, Harbin, 150000, Heilongjiang, China.
Institute of Trustworthy Embodied AI, Fudan University, Shanghai, 200433, China.
Neural Netw. 2025 Nov;191:107768. doi: 10.1016/j.neunet.2025.107768. Epub 2025 Jun 30.
Deduction, abduction and induction are the three primary forms of logical reasoning. Although they complement each other, they are typically studied separately. In this paper, we investigate their roles in a unified paradigm and propose a continual joint reasoning framework. According to the cognitive theory, the three reasoning methods could be combined in a dynamic cycle. We thus design a three-level learning procedure. First, deduction and abduction are formulated as generation tasks and connected via dual learning to validate each other for mutual improvements. Second, we introduce induction as a fact retriever to support and guide the above dual learning. Finally, to alleviate the data scarcity issue, we design a policy gradient method to allow continuous enhancements based on inferred pseudo training data, instead of expensive parallel annotations. In particular, we design three types of rewards to estimate the quality of the inferred pseudo training data and to avoid the model collapse issue. Extensive experiments, including human evaluation, reveal their mutual effects and verify the synergy effects of the three forms of logical reasoning. Notably, our GPT-2-based framework can achieve comparable performance with GPT-3.5 in human evaluation.
演绎、溯因和归纳是逻辑推理的三种主要形式。尽管它们相互补充,但通常是分开研究的。在本文中,我们研究它们在统一范式中的作用,并提出一个连续联合推理框架。根据认知理论,这三种推理方法可以在一个动态循环中结合起来。因此,我们设计了一个三级学习过程。首先,将演绎和溯因表述为生成任务,并通过对偶学习相互连接以相互验证,实现共同改进。其次,我们引入归纳作为事实检索器,以支持和指导上述对偶学习。最后,为了缓解数据稀缺问题,我们设计了一种策略梯度方法,允许基于推断的伪训练数据进行持续增强,而不是依赖昂贵的并行标注。特别是,我们设计了三种类型的奖励来评估推断的伪训练数据的质量,并避免模型崩溃问题。包括人工评估在内的大量实验揭示了它们的相互作用,并验证了三种逻辑推理形式的协同效应。值得注意的是,我们基于GPT-2的框架在人工评估中可以达到与GPT-3.5相当的性能。