Yao Junping, Yuan Cong, Li Xiaojun, Wang Yijing, Su Yi
Xi'an Research Inst. of High-Tech, Xi'an, Shaanxi, China.
PeerJ Comput Sci. 2023 Dec 8;9:e1725. doi: 10.7717/peerj-cs.1725. eCollection 2023.
Answer sorting and filtering are two closely related steps for determining the answer to a question. Answer sorting is designed to produce an ordered list of scores based on Top-k and contextual criteria. Answer filtering optimizes the selection according to other criteria, such as the range of time constraints the user expects. However, the unclear number of answers and time constraints, as well as the high score of false positive results, indicate that the traditional sorting and selection methods cannot guarantee the quality of answers to multi-answer questions. Therefore, this study proposes MATQA, a component based on multi-answer temporal question reasoning, using a re-validation framework to convert the Top-k answer list output by the QA system into a clear number of answer combinations, and a new multi-answer based evaluation index is proposed for this output form. First, the highly correlated subgraph is selected by calculating the scores of the boot node and the related fact node. Second, the subgraph attention inference module is introduced to determine the initial answer with the highest probability. Finally, the alternative answers are clustered at the semantic level and the time constraint level. Meanwhile, the candidate answers with similar types and high scores but do not satisfy the semantic constraints or the time constraints are eliminated to ensure the number and accuracy of final answers. Experiments on the multi-answer TimeQuestions dataset demonstrate the effectiveness of the answer combinations output by MATQA.
答案排序和筛选是确定问题答案的两个紧密相关的步骤。答案排序旨在根据Top-k和上下文标准生成一个有序的分数列表。答案筛选则根据其他标准(如用户期望的时间限制范围)优化选择。然而,答案数量和时间限制不明确,以及误报结果的高分表明,传统的排序和选择方法不能保证多答案问题答案的质量。因此,本研究提出了MATQA,这是一个基于多答案时间问题推理的组件,使用重新验证框架将问答系统输出的Top-k答案列表转换为明确数量的答案组合,并针对此输出形式提出了一种新的基于多答案的评估指标。首先,通过计算引导节点和相关事实节点的分数来选择高度相关的子图。其次,引入子图注意力推理模块来确定概率最高的初始答案。最后,在语义层面和时间约束层面上对备选答案进行聚类。同时,消除类型相似且分数高但不满足语义约束或时间约束的候选答案,以确保最终答案的数量和准确性。在多答案TimeQuestions数据集上的实验证明了MATQA输出的答案组合的有效性。