Wakamiya Shoko, Morita Mizuki, Kano Yoshinobu, Ohkuma Tomoko, Aramaki Eiji
Institute for Research Initiatives, Nara Institute of Science and Technology, Ikoma, Japan.
Graduate School of Science and Technology, Nara Institute of Science and Technology, Ikoma, Japan.
J Med Internet Res. 2019 Feb 20;21(2):e12783. doi: 10.2196/12783.
The amount of medical and clinical-related information on the Web is increasing. Among the different types of information available, social media-based data obtained directly from people are particularly valuable and are attracting significant attention. To encourage medical natural language processing (NLP) research exploiting social media data, the 13th NII Testbeds and Community for Information access Research (NTCIR-13) Medical natural language processing for Web document (MedWeb) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering 3 languages (Japanese, English, and Chinese) and annotated with 8 symptom labels (such as cold, fever, and flu). Then, participants classify each tweet into 1 of the 2 categories: those containing a patient's symptom and those that do not.
This study aimed to present the results of groups participating in a Japanese subtask, English subtask, and Chinese subtask along with discussions, to clarify the issues that need to be resolved in the field of medical NLP.
In summary, 8 groups (19 systems) participated in the Japanese subtask, 4 groups (12 systems) participated in the English subtask, and 2 groups (6 systems) participated in the Chinese subtask. In total, 2 baseline systems were constructed for each subtask. The performance of the participant and baseline systems was assessed using the exact match accuracy, F-measure based on precision and recall, and Hamming loss.
The best system achieved exactly 0.880 match accuracy, 0.920 F-measure, and 0.019 Hamming loss. The averages of match accuracy, F-measure, and Hamming loss for the Japanese subtask were 0.720, 0.820, and 0.051; those for the English subtask were 0.770, 0.850, and 0.037; and those for the Chinese subtask were 0.810, 0.880, and 0.032, respectively.
This paper presented and discussed the performance of systems participating in the NTCIR-13 MedWeb task. As the MedWeb task settings can be formalized as the factualization of text, the achievement of this task could be directly applied to practical clinical applications.
网络上医学及临床相关信息的数量在不断增加。在各类可用信息中,直接从人们那里获取的基于社交媒体的数据尤为珍贵,正吸引着广泛关注。为鼓励利用社交媒体数据进行医学自然语言处理(NLP)研究,第13届信息获取研究NII试验平台与社区(NTCIR - 13)的网络文档医学自然语言处理(MedWeb)在跨语言多标签语料库中提供了伪推特消息,涵盖3种语言(日语、英语和中文),并标注了8种症状标签(如感冒、发烧和流感)。然后,参与者将每条推文分类为两类中的一类:包含患者症状的推文和不包含患者症状的推文。
本研究旨在展示参与日语子任务、英语子任务和中文子任务的各小组的结果,并进行讨论,以阐明医学NLP领域需要解决的问题。
简而言之,8个小组(19个系统)参与了日语子任务,4个小组(12个系统)参与了英语子任务,2个小组(6个系统)参与了中文子任务。每个子任务总共构建了2个基线系统。使用精确匹配准确率、基于精确率和召回率的F值以及汉明损失来评估参与者系统和基线系统的性能。
最佳系统的精确匹配准确率达到0.880,F值为0.920,汉明损失为0.019。日语子任务的匹配准确率、F值和汉明损失的平均值分别为0.720、0.820和0.051;英语子任务的分别为0.770、0.850和0.037;中文子任务的分别为0.810、0.880和0.032。
本文展示并讨论了参与NTCIR - 13 MedWeb任务的系统的性能。由于MedWeb任务设置可形式化为文本事实化,该任务的成果可直接应用于实际临床应用。