通过数据增强和模型加权改进生物医学问答

Improving Biomedical Question Answering by Data Augmentation and Model Weighting.

作者信息

Du Yongping, Yan Jingya, Lu Yuxuan, Zhao Yiliang, Jin Xingnan

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1114-1124. doi: 10.1109/TCBB.2022.3171388. Epub 2023 Apr 3.

DOI:10.1109/TCBB.2022.3171388

Abstract

Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of training data, and the performance is not as good as in open-domain settings, especially degrading when facing to the adversarial sample. We try to resolve the above issues. First, effective data augmentation strategies are adopted to improve the model training, including slide window, summarization and round-trip translation. Second, we propose a model weighting strategy for the final answer prediction in biomedical domain, which combines the advantage of two models, open-domain model QANet and BioBERT pre-trained in biomedical domain data. Finally, we give adversarial training to reinforce the robustness of the model. The public biomedical dataset collected from PubMed provided by BioASQ challenge is used to evaluate our approach. The results show that the model performance has been improved significantly compared to the single model and other models participated in BioASQ challenge. It can learn richer semantic expression from data augmentation and adversarial samples, which is beneficial to solve more complex question answering problems in biomedical domain.

摘要

生物医学问答旨在从生物医学语境中提取给定问题的答案。由于特定领域的专业性很强，为特定领域的问答构建大规模数据集更加困难。现有方法受到训练数据缺乏的限制，其性能不如开放域设置中的性能，尤其是在面对对抗样本时会下降。我们试图解决上述问题。首先，采用有效的数据增强策略来改进模型训练，包括滑动窗口、摘要和往返翻译。其次，我们为生物医学领域的最终答案预测提出了一种模型加权策略，该策略结合了开放域模型QANet和在生物医学领域数据中预训练的BioBERT这两种模型的优势。最后，我们进行对抗训练以增强模型的鲁棒性。使用从BioASQ挑战赛提供的PubMed中收集的公共生物医学数据集来评估我们的方法。结果表明，与单个模型和参加BioASQ挑战赛的其他模型相比，该模型性能有了显著提高。它可以从数据增强和对抗样本中学习更丰富的语义表达，这有利于解决生物医学领域中更复杂的问答问题。

相似文献

Improving Biomedical Question Answering by Data Augmentation and Model Weighting.

IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1114-1124. doi: 10.1109/TCBB.2022.3171388. Epub 2023 Apr 3.

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.

BMC Bioinformatics. 2015 Apr 30;16:138. doi: 10.1186/s12859-015-0564-6.

A Machine Learning-based Method for Question Type Classification in Biomedical Question Answering.

Methods Inf Med. 2017 May 18;56(3):209-216. doi: 10.3414/ME16-01-0116. Epub 2017 Mar 31.

Deep scaled dot-product attention based domain adaptation model for biomedical question answering.

Methods. 2020 Feb 15;173:69-74. doi: 10.1016/j.ymeth.2019.06.024. Epub 2019 Jun 26.

Word embeddings and external resources for answer processing in biomedical factoid question answering.

J Biomed Inform. 2019 Apr;92:103118. doi: 10.1016/j.jbi.2019.103118. Epub 2019 Feb 10.

Multi-label biomedical question classification for lexical answer type prediction.

J Biomed Inform. 2019 May;93:103143. doi: 10.1016/j.jbi.2019.103143. Epub 2019 Mar 12.

SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.

Artif Intell Med. 2020 Jan;102:101767. doi: 10.1016/j.artmed.2019.101767. Epub 2019 Nov 28.

External features enriched model for biomedical question answering.

BMC Bioinformatics. 2021 May 26;22(1):272. doi: 10.1186/s12859-021-04176-7.

Named Entity Aware Transfer Learning for Biomedical Factoid Question Answering.

IEEE/ACM Trans Comput Biol Bioinform. 2022 Jul-Aug;19(4):2365-2376. doi: 10.1109/TCBB.2021.3079339. Epub 2022 Aug 8.

Adversarial Knowledge Distillation Based Biomedical Factoid Question Answering.

IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):106-118. doi: 10.1109/TCBB.2022.3161032. Epub 2023 Feb 3.

引用本文的文献

Rethinking Human-AI Collaboration in Complex Medical Decision Making: A Case Study in Sepsis Diagnosis.

Proc SIGCHI Conf Hum Factor Comput Syst. 2024 May;2024. doi: 10.1145/3613904.3642343. Epub 2024 May 11.

Question answering systems for health professionals at the point of care-a systematic review.

J Am Med Inform Assoc. 2024 Apr 3;31(4):1009-1024. doi: 10.1093/jamia/ocae015.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过数据增强和模型加权改进生物医学问答

Improving Biomedical Question Answering by Data Augmentation and Model Weighting.

作者信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献