Du Yongping, Yan Jingya, Lu Yuxuan, Zhao Yiliang, Jin Xingnan
IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1114-1124. doi: 10.1109/TCBB.2022.3171388. Epub 2023 Apr 3.
Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of training data, and the performance is not as good as in open-domain settings, especially degrading when facing to the adversarial sample. We try to resolve the above issues. First, effective data augmentation strategies are adopted to improve the model training, including slide window, summarization and round-trip translation. Second, we propose a model weighting strategy for the final answer prediction in biomedical domain, which combines the advantage of two models, open-domain model QANet and BioBERT pre-trained in biomedical domain data. Finally, we give adversarial training to reinforce the robustness of the model. The public biomedical dataset collected from PubMed provided by BioASQ challenge is used to evaluate our approach. The results show that the model performance has been improved significantly compared to the single model and other models participated in BioASQ challenge. It can learn richer semantic expression from data augmentation and adversarial samples, which is beneficial to solve more complex question answering problems in biomedical domain.
生物医学问答旨在从生物医学语境中提取给定问题的答案。由于特定领域的专业性很强,为特定领域的问答构建大规模数据集更加困难。现有方法受到训练数据缺乏的限制,其性能不如开放域设置中的性能,尤其是在面对对抗样本时会下降。我们试图解决上述问题。首先,采用有效的数据增强策略来改进模型训练,包括滑动窗口、摘要和往返翻译。其次,我们为生物医学领域的最终答案预测提出了一种模型加权策略,该策略结合了开放域模型QANet和在生物医学领域数据中预训练的BioBERT这两种模型的优势。最后,我们进行对抗训练以增强模型的鲁棒性。使用从BioASQ挑战赛提供的PubMed中收集的公共生物医学数据集来评估我们的方法。结果表明,与单个模型和参加BioASQ挑战赛的其他模型相比,该模型性能有了显著提高。它可以从数据增强和对抗样本中学习更丰富的语义表达,这有利于解决生物医学领域中更复杂的问答问题。