Gaudet-Blavignac Christophe, Foufi Vasiliki, Wehrli Eric, Lovis Christian
Division of Medical Information Sciences, Geneva University Hospitals and University of Geneva.
Laboratoire d'Analyse et de Technologie du Langage, University of Geneva.
Stud Health Technol Inform. 2018;247:710-714.
Medical data is multimodal. In particular, it is composed of both structured data and narrative data (free text). Narrative data is a type of unstructured data that, although containing valuable semantic and conceptual information, is rarely reused. In order to assure interoperability of medical data, automatic annotation of free text with SNOMED CT concepts via Natural Language Processing (NLP) tools is proposed. This task is performed using a hybrid multilingual syntactic parser. A preliminary evaluation of the annotation shows encouraging results and confirms that semantic enrichment of patient-related narratives can be accomplished by hybrid NLP systems, heavily based on syntax and lexicosemantic resources.
医学数据是多模态的。具体而言,它由结构化数据和叙述性数据(自由文本)组成。叙述性数据是一种非结构化数据,尽管包含有价值的语义和概念信息,但很少被重复使用。为确保医学数据的互操作性,建议通过自然语言处理(NLP)工具使用SNOMED CT概念对自由文本进行自动标注。此任务使用混合多语言句法分析器执行。标注的初步评估显示了令人鼓舞的结果,并证实基于句法和词汇语义资源的混合NLP系统可以实现与患者相关叙述的语义丰富。