Cruchet Sarah, Gaudinat Arnaud, Boyer Célia
Health On the Net Foundation, Geneva, Switzerland.
Stud Health Technol Inform. 2008;136:407-12.
Many attempts have been made in the QA domain but no system applicable to the field of health is currently available on the Internet. This paper describes a bilingual French/English question answering system adapted to the health domain and more particularly the detection of the question's model. Indeed, the Question Analyzer module for identifying the question's model has a greater effect on the rest of the QA system. Our original hypothesis for the QA is that a question can be defined by two criteria: type of answer expected and medical type. These two must appear in the step of detection of the model in order to better define the type of question and thus, the corresponding answer. For this, questions were searched on the Internet and then given to experts in order to obtain classifications according to criteria such as type of question and type of medical context as mentioned above. In addition, tests of supervised and non-supervised classification were made to determine features of questions. The result of this first step was that algorithms of classification were chosen. The results obtained showed that categorizers giving the best results were the SVM. Currently, for a set of 100 questions, 84 are well categorized in English and 68 in French according to the type of answer expected. This figures fall to less than 50% for the medical type. Evaluations have showed that the system was good to identify the type of answer expected and could be enhanced for the medical type. It leads us to use an external source of knowledge: UMLS. A future improvement will be the usage of UMLS semantic network to better categorize a query according to the medical domain.
在问答领域已经进行了许多尝试,但目前互联网上还没有适用于健康领域的系统。本文描述了一种适用于健康领域的法语/英语双语问答系统,尤其是对问题模型的检测。实际上,用于识别问题模型的问题分析器模块对问答系统的其他部分有更大的影响。我们对问答的原始假设是,一个问题可以由两个标准来定义:预期答案的类型和医学类型。这两者必须出现在模型检测步骤中,以便更好地定义问题的类型,从而确定相应的答案。为此,在互联网上搜索问题,然后交给专家,以便根据上述问题类型和医学背景类型等标准进行分类。此外,还进行了监督分类和非监督分类测试,以确定问题的特征。第一步的结果是选择了分类算法。获得的结果表明,给出最佳结果的分类器是支持向量机(SVM)。目前,对于一组100个问题,根据预期答案的类型,84个问题在英语中得到了很好的分类,68个问题在法语中得到了很好的分类。对于医学类型,这个数字下降到不到50%。评估表明,该系统能够很好地识别预期答案的类型,并且在医学类型方面可以得到改进。这促使我们使用外部知识源:统一医学语言系统(UMLS)。未来改进将是使用UMLS语义网络,以便根据医学领域更好地对查询进行分类。