• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ECAsT:用于对话搜索的大型数据集及指标稳健性评估

ECAsT: a large dataset for conversational search and an evaluation of metric robustness.

作者信息

Al-Thani Haya, Jansen Bernard J, Elsayed Tamer

机构信息

College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.

Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar.

出版信息

PeerJ Comput Sci. 2023 Apr 17;9:e1328. doi: 10.7717/peerj-cs.1328. eCollection 2023.

DOI:10.7717/peerj-cs.1328
PMID:37346722
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10280565/
Abstract

The Text REtrieval Conference Conversational assistance track (CAsT) is an annual conversational passage retrieval challenge to create a large-scale open-domain conversational search benchmarking. However, as of yet, the datasets used are small, with just more than 1,000 turns and 100 conversation topics. In the first part of this research, we address the dataset limitation by building a much larger novel multi-turn conversation dataset for conversation search benchmarking called Expanded-CAsT (ECAsT). ECAsT is built using a multi-stage solution that uses a combination of conversational query reformulation and neural paraphrasing and also includes a new model to create multi-turn paraphrases. The meaning and diversity of paraphrases are evaluated with human and automatic evaluation. Using this methodology, we produce and release to the research community a conversational search dataset that is 665% more extensive in terms of size and language diversity than is available at the time of this study, with more than 9,200 turns. The augmented dataset not only provides more data but also more language diversity to improve conversational search neural model training and testing. In the second part of the research, we use ECAsT to assess the robustness of traditional metrics for conversational evaluation used in CAsT and identify its bias toward language diversity. Results show the benefits of adding language diversity for improving the collection of pooled passages and reducing evaluation bias. We found that introducing language diversity via paraphrases returned up to 24% new passages compared to only 2% using CAsT baseline.

摘要

文本检索会议对话辅助赛道(CAsT)是一项年度对话段落检索挑战,旨在创建一个大规模的开放域对话搜索基准。然而,截至目前,所使用的数据集规模较小,只有1000多个轮次和100个对话主题。在本研究的第一部分,我们通过构建一个名为扩展CAsT(ECAsT)的更大的新型多轮对话数据集来解决数据集的局限性,用于对话搜索基准测试。ECAsT是使用一种多阶段解决方案构建的,该方案结合了对话查询改写和神经释义,还包括一个用于创建多轮释义的新模型。通过人工和自动评估对释义的意义和多样性进行评估。使用这种方法,我们生成并向研究社区发布了一个对话搜索数据集,其规模和语言多样性比本研究时可用的数据集大665%,有超过9200个轮次。扩充后的数据集不仅提供了更多的数据,还提供了更多的语言多样性,以改进对话搜索神经模型的训练和测试。在研究的第二部分,我们使用ECAsT来评估CAsT中用于对话评估的传统指标的稳健性,并识别其对语言多样性的偏差。结果显示了增加语言多样性对改善汇总段落的收集和减少评估偏差的好处。我们发现,通过释义引入语言多样性可返回高达24%的新段落,而使用CAsT基线时仅为2%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e735/10280565/822e003b026c/peerj-cs-09-1328-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e735/10280565/6c6fe169701b/peerj-cs-09-1328-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e735/10280565/0c5ba84e53ea/peerj-cs-09-1328-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e735/10280565/2485d4d2a24a/peerj-cs-09-1328-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e735/10280565/dc7bc0326732/peerj-cs-09-1328-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e735/10280565/f19ff830f295/peerj-cs-09-1328-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e735/10280565/9f3e759d757b/peerj-cs-09-1328-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e735/10280565/822e003b026c/peerj-cs-09-1328-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e735/10280565/6c6fe169701b/peerj-cs-09-1328-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e735/10280565/0c5ba84e53ea/peerj-cs-09-1328-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e735/10280565/2485d4d2a24a/peerj-cs-09-1328-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e735/10280565/dc7bc0326732/peerj-cs-09-1328-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e735/10280565/f19ff830f295/peerj-cs-09-1328-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e735/10280565/9f3e759d757b/peerj-cs-09-1328-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e735/10280565/822e003b026c/peerj-cs-09-1328-g007.jpg

相似文献

1
ECAsT: a large dataset for conversational search and an evaluation of metric robustness.ECAsT:用于对话搜索的大型数据集及指标稳健性评估
PeerJ Comput Sci. 2023 Apr 17;9:e1328. doi: 10.7717/peerj-cs.1328. eCollection 2023.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Conversational Image Search.对话式图像搜索
IEEE Trans Image Process. 2021;30:7732-7743. doi: 10.1109/TIP.2021.3108724. Epub 2021 Sep 10.
4
Conversations in dementia with Lewy bodies: Resources and barriers in communication.路易体痴呆症的对话:沟通的资源和障碍。
Int J Lang Commun Disord. 2023 Mar;58(2):419-432. doi: 10.1111/1460-6984.12799. Epub 2022 Dec 20.
5
Keep the Ball Rolling: Sustained Multiturn Conversational Episodes Are Associated With Child Language Ability.保持对话滚动:持续的多轮对话与儿童语言能力相关。
Am J Speech Lang Pathol. 2022 Sep 7;31(5):2186-2194. doi: 10.1044/2022_AJSLP-21-00333. Epub 2022 Aug 15.
6
The conversational skills of school-aged children with cochlear implants.佩戴人工耳蜗的学龄儿童的对话技巧。
Cochlear Implants Int. 2013 Mar;14(2):67-79. doi: 10.1179/1754762812Y.0000000002.
7
Design, development, and use of conversational agents in rehabilitation for adults with brain-related neurological conditions: a scoping review.设计、开发和应用会话代理于成人脑相关神经状况康复治疗中的研究进展:综述
JBI Evid Synth. 2023 Feb 1;21(2):326-372. doi: 10.11124/JBIES-22-00025.
8
Interaction training for conversational partners of children with cerebral palsy: a systematic review.脑瘫患儿对话伙伴的互动训练:一项系统评价
Int J Lang Commun Disord. 2004 Apr-Jun;39(2):151-70. doi: 10.1080/13682820310001625598.
9
Paraphrasing to improve the performance of Electronic Health Records Question Answering.通过释义提高电子健康记录问答的性能。
AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:626-635. eCollection 2020.
10
Differences in information accessed in a pharmacologic knowledge base using a conversational agent vs traditional search methods.使用会话代理与传统搜索方法在药理学知识库中获取信息的差异。
Int J Med Inform. 2021 Sep;153:104530. doi: 10.1016/j.ijmedinf.2021.104530. Epub 2021 Jul 16.

本文引用的文献

1
Abusive language detection in youtube comments leveraging replies as conversational context.利用回复作为对话上下文来检测YouTube评论中的辱骂性语言。
PeerJ Comput Sci. 2021 Oct 8;7:e742. doi: 10.7717/peerj-cs.742. eCollection 2021.
2
Researching COVID-19 tracing app acceptance: incorporating theory from the technological acceptance model.研究新冠病毒接触者追踪应用程序的接受度:纳入技术接受模型的理论
PeerJ Comput Sci. 2021 Jan 4;7:e316. doi: 10.7717/peerj-cs.316. eCollection 2021.
3
The measurement of observer agreement for categorical data.
分类数据观察者一致性的测量。
Biometrics. 1977 Mar;33(1):159-74.