Kell Gregory, Roberts Angus, Umansky Serge, Khare Yuti, Ahmed Najma, Patel Nikhil, Simela Chloe, Coumbe Jack, Rozario Julian, Griffiths Ryan-Rhys, Marshall Iain J
King's College London, London, Greater London, United Kingdom.
Metadvice Ltd., London, Greater London, United Kingdom.
AMIA Annu Symp Proc. 2025 May 22;2024:590-599. eCollection 2024.
Clinical question answering systems have the potential to provide clinicians with relevant and timely answers to their questions. Nonetheless, despite the advances that have been made, adoption of these systems in clinical settings has been slow. One issue is a lack of question-answering datasets which reflect the real-world needs of health professionals. In this work, we present RealMedQA, a dataset of realistic clinical questions generated by humans and an LLM. We describe the process for generating and verifying the QA pairs and assess several QA models on BioASQ and RealMedQA to assess the relative difficulty of matching answers to questions. We show that the LLM is more cost-efficient for generating "ideal" QA pairs. Additionally, we achieve a lower lexical similarity between questions and answers than BioASQ which provides an additional challenge to the top two QA models, as per the results. We release our code and our dataset publicly to encourage further research.
临床问答系统有潜力为临床医生提供与其问题相关且及时的答案。尽管如此,尽管已经取得了进展,但这些系统在临床环境中的采用速度一直很慢。一个问题是缺乏反映卫生专业人员现实世界需求的问答数据集。在这项工作中,我们展示了RealMedQA,这是一个由人类和语言模型生成的现实临床问题数据集。我们描述了生成和验证问答对的过程,并在BioASQ和RealMedQA上评估了几个问答模型,以评估将答案与问题匹配的相对难度。我们表明,语言模型在生成“理想”问答对方面更具成本效益。此外,根据结果,我们实现的问题与答案之间的词汇相似度低于BioASQ,这给前两个问答模型带来了额外的挑战。我们公开发布我们的代码和数据集,以鼓励进一步的研究。