Institute of Informatics and Telecommunications, National Center for Scientific Research "Demokritos", Athens, Greece.
School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece.
Sci Data. 2023 Mar 27;10(1):170. doi: 10.1038/s41597-023-02068-4.
The BioASQ question answering (QA) benchmark dataset contains questions in English, along with golden standard (reference) answers and related material. The dataset has been designed to reflect real information needs of biomedical experts and is therefore more realistic and challenging than most existing datasets. Furthermore, unlike most previous QA benchmarks that contain only exact answers, the BioASQ-QA dataset also includes ideal answers (in effect summaries), which are particularly useful for research on multi-document summarization. The dataset combines structured and unstructured data. The materials linked with each question comprise documents and snippets, which are useful for Information Retrieval and Passage Retrieval experiments, as well as concepts that are useful in concept-to-text Natural Language Generation. Researchers working on paraphrasing and textual entailment can also measure the degree to which their methods improve the performance of biomedical QA systems. Last but not least, the dataset is continuously extended, as the BioASQ challenge is running and new data are generated.
BioASQ 问答 (QA) 基准数据集包含英文问题,以及黄金标准 (参考) 答案和相关材料。该数据集旨在反映生物医学专家的实际信息需求,因此比大多数现有数据集更具现实性和挑战性。此外,与大多数仅包含确切答案的以前的 QA 基准不同,BioASQ-QA 数据集还包括理想答案 (实际上是摘要),这对于多文档摘要研究特别有用。该数据集结合了结构化和非结构化数据。每个问题链接的材料包括文档和片段,这对于信息检索和段落检索实验以及在概念到文本的自然语言生成中有用的概念非常有用。从事释义和文本蕴涵研究的研究人员也可以衡量他们的方法在多大程度上提高了生物医学 QA 系统的性能。最后但同样重要的是,随着 BioASQ 挑战赛的进行和新数据的生成,该数据集在不断扩展。