Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing, 100190, China.
University of Chinese Academy of Sciences, Beijing, 100049, China.
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):52. doi: 10.1186/s12911-019-0761-8.
Medical and clinical question answering (QA) is highly concerned by researchers recently. Though there are remarkable advances in this field, the development in Chinese medical domain is relatively backward. It can be attributed to the difficulty of Chinese text processing and the lack of large-scale datasets. To bridge the gap, this paper introduces a Chinese medical QA dataset and proposes effective methods for the task.
We first construct a large scale Chinese medical QA dataset. Then we leverage deep matching neural networks to capture semantic interaction between words in questions and answers. Considering that Chinese Word Segmentation (CWS) tools may fail to identify clinical terms, we design a module to merge the word segments and produce a new representation. It learns the common compositions of words or segments by using convolutional kernels and selects the strongest signals by windowed pooling.
The best performer among popular CWS tools on our dataset is found. In our experiments, deep matching models substantially outperform existing methods. Results also show that our proposed semantic clustered representation module improves the performance of models by up to 5.5% Precision at 1 and 4.9% Mean Average Precision.
In this paper, we introduce a large scale Chinese medical QA dataset and cast the task into a semantic matching problem. We also compare different CWS tools and input units. Among the two state-of-the-art deep matching neural networks, MatchPyramid performs better. Results also show the effectiveness of the proposed semantic clustered representation module.
医学和临床问答(QA)是研究人员最近高度关注的问题。尽管在这个领域取得了显著的进展,但中文医学领域的发展相对落后。这可以归因于中文文本处理的难度和缺乏大规模数据集。为了弥补这一差距,本文介绍了一个中文医学 QA 数据集,并提出了该任务的有效方法。
我们首先构建了一个大规模的中文医学 QA 数据集。然后,我们利用深度匹配神经网络来捕捉问题和答案中单词之间的语义交互。考虑到中文分词(CWS)工具可能无法识别临床术语,我们设计了一个模块来合并词段并生成新的表示。它通过卷积核学习单词或词段的常见组合,并通过窗口池选择最强信号。
在我们的数据集上,找到表现最好的流行 CWS 工具。在我们的实验中,深度匹配模型大大优于现有方法。结果还表明,我们提出的语义聚类表示模块通过在 1 个和 4.9%的平均精度上提高 5.5%的精度来提高模型的性能。
在本文中,我们引入了一个大规模的中文医学 QA 数据集,并将任务转化为语义匹配问题。我们还比较了不同的 CWS 工具和输入单元。在两种最先进的深度匹配神经网络中,MatchPyramid 的表现更好。结果还表明了所提出的语义聚类表示模块的有效性。