Department of Artificial Intelligence and Computing Systems, University of Havana, San Lázaro y L. Edificio Felipe Poey, Plaza de la Revolución, Havana, Cuba.
Department of Software and Computing Systems, University of Alicante, Carretera San Vicente del Raspeig s/n, 03690 Alicante, Spain.
J Biomed Inform. 2019 Jun;94:103172. doi: 10.1016/j.jbi.2019.103172. Epub 2019 Apr 6.
This paper presents and describes eHealth-KD corpus. The corpus is a collection of 1173 Spanish health-related sentences manually annotated with a general semantic structure that captures most of the content, without resorting to domain-specific labels. The semantic representation is first defined and illustrated with example sentences from the corpus. Next, the paper summarizes the process of annotation and provides key metrics of the corpus. Finally, three baseline implementations, which are supported by machine learning models, were designed to consider the complexity of learning the corpus semantics. The resulting corpus was used as an evaluation scenario in TASS 2018 (Martínez-Cámara et al., 2018) and the findings obtained by participants are discussed. The eHealth-KD corpus provides the first step in the design of a general-purpose semantic framework that can be used to extract knowledge from a variety of domains.
本文介绍并描述了 eHealth-KD 语料库。该语料库是一个包含 1173 个西班牙语健康相关句子的集合,这些句子经过人工标注,具有捕捉大部分内容的通用语义结构,而无需使用特定于领域的标签。首先,通过来自语料库的示例句子定义并说明语义表示。接下来,本文总结了标注过程并提供了语料库的关键指标。最后,设计了三个基于机器学习模型的基线实现,以考虑学习语料库语义的复杂性。所得到的语料库被用作 TASS 2018(Martínez-Cámara 等人,2018)中的评估场景,讨论了参与者获得的发现。eHealth-KD 语料库为设计一个通用语义框架提供了第一步,该框架可用于从各种领域中提取知识。