Al-Thani Haya, Jansen Bernard J, Elsayed Tamer
College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.
Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar.
PeerJ Comput Sci. 2023 Apr 17;9:e1328. doi: 10.7717/peerj-cs.1328. eCollection 2023.
The Text REtrieval Conference Conversational assistance track (CAsT) is an annual conversational passage retrieval challenge to create a large-scale open-domain conversational search benchmarking. However, as of yet, the datasets used are small, with just more than 1,000 turns and 100 conversation topics. In the first part of this research, we address the dataset limitation by building a much larger novel multi-turn conversation dataset for conversation search benchmarking called Expanded-CAsT (ECAsT). ECAsT is built using a multi-stage solution that uses a combination of conversational query reformulation and neural paraphrasing and also includes a new model to create multi-turn paraphrases. The meaning and diversity of paraphrases are evaluated with human and automatic evaluation. Using this methodology, we produce and release to the research community a conversational search dataset that is 665% more extensive in terms of size and language diversity than is available at the time of this study, with more than 9,200 turns. The augmented dataset not only provides more data but also more language diversity to improve conversational search neural model training and testing. In the second part of the research, we use ECAsT to assess the robustness of traditional metrics for conversational evaluation used in CAsT and identify its bias toward language diversity. Results show the benefits of adding language diversity for improving the collection of pooled passages and reducing evaluation bias. We found that introducing language diversity via paraphrases returned up to 24% new passages compared to only 2% using CAsT baseline.
文本检索会议对话辅助赛道(CAsT)是一项年度对话段落检索挑战,旨在创建一个大规模的开放域对话搜索基准。然而,截至目前,所使用的数据集规模较小,只有1000多个轮次和100个对话主题。在本研究的第一部分,我们通过构建一个名为扩展CAsT(ECAsT)的更大的新型多轮对话数据集来解决数据集的局限性,用于对话搜索基准测试。ECAsT是使用一种多阶段解决方案构建的,该方案结合了对话查询改写和神经释义,还包括一个用于创建多轮释义的新模型。通过人工和自动评估对释义的意义和多样性进行评估。使用这种方法,我们生成并向研究社区发布了一个对话搜索数据集,其规模和语言多样性比本研究时可用的数据集大665%,有超过9200个轮次。扩充后的数据集不仅提供了更多的数据,还提供了更多的语言多样性,以改进对话搜索神经模型的训练和测试。在研究的第二部分,我们使用ECAsT来评估CAsT中用于对话评估的传统指标的稳健性,并识别其对语言多样性的偏差。结果显示了增加语言多样性对改善汇总段落的收集和减少评估偏差的好处。我们发现,通过释义引入语言多样性可返回高达24%的新段落,而使用CAsT基线时仅为2%。