Kokkinakis Dimitrios
Department of Swedish Language, Språkdata, University of Gothenburg, Sweden.
Stud Health Technol Inform. 2008;136:217-22.
In the context of scientific and technical texts, meaning is usually embedded in noun compounds and the semantic interpretation of these compounds deals with the detection and semantic classification of the relation that holds between the compound's constituents. Semantic relation mining, the technology applied for marking up, interpreting, extracting and classifying relations that hold between pairs of words, is an important enterprise that contribute to deeper means of enhancing document understanding technologies, such as Information Extraction, Question Answering, Summarization, Paraphrasing, Ontology Building and Textual Entailment. This paper explores the application of assigning semantic descriptors taken from a multilingual medical thesaurus to a large sample of solid (closed form) compounds taken from large Swedish medical corpora, and determining the relation(s) that may hold between the compound constituents. Our work is inspired by previous research in the area of using lexical hierarchies for identifying relations between two-word noun compounds in the medical domain. In contrast to previous research, Swedish, as other Germanic languages, require further means of analysis, since compounds are written as one sequence with no white space between the words, e.g. virus diseases vs. virussjukdomar, which makes the problem more challenging, since solid compounds are harder to identify and segment.
在科技文本的语境中,意义通常蕴含在名词复合词中,而这些复合词的语义解释涉及对复合词成分之间关系的检测和语义分类。语义关系挖掘是一种用于标记、解释、提取和分类词对之间关系的技术,是一项重要的工作,有助于深化诸如信息提取、问答、摘要、释义、本体构建和文本蕴含等文档理解技术。本文探讨了将取自多语言医学词库的语义描述符应用于从大型瑞典医学语料库中抽取的大量固态(封闭形式)复合词样本,并确定复合词成分之间可能存在的关系。我们的工作受到该领域先前研究的启发,即利用词汇层次结构来识别医学领域中双词名词复合词之间的关系。与先前的研究不同,瑞典语和其他日耳曼语一样,需要进一步的分析方法,因为复合词写成一个序列,词与词之间没有空格,例如“virus diseases”(病毒疾病)对应“virussjukdomar”,这使得问题更具挑战性,因为固态复合词更难识别和切分。