Suppr超能文献

医学语料库中固体化合物的语义关系挖掘

Semantic relation mining of solid compounds in medical corpora.

作者信息

Kokkinakis Dimitrios

机构信息

Department of Swedish Language, Språkdata, University of Gothenburg, Sweden.

出版信息

Stud Health Technol Inform. 2008;136:217-22.

Abstract

In the context of scientific and technical texts, meaning is usually embedded in noun compounds and the semantic interpretation of these compounds deals with the detection and semantic classification of the relation that holds between the compound's constituents. Semantic relation mining, the technology applied for marking up, interpreting, extracting and classifying relations that hold between pairs of words, is an important enterprise that contribute to deeper means of enhancing document understanding technologies, such as Information Extraction, Question Answering, Summarization, Paraphrasing, Ontology Building and Textual Entailment. This paper explores the application of assigning semantic descriptors taken from a multilingual medical thesaurus to a large sample of solid (closed form) compounds taken from large Swedish medical corpora, and determining the relation(s) that may hold between the compound constituents. Our work is inspired by previous research in the area of using lexical hierarchies for identifying relations between two-word noun compounds in the medical domain. In contrast to previous research, Swedish, as other Germanic languages, require further means of analysis, since compounds are written as one sequence with no white space between the words, e.g. virus diseases vs. virussjukdomar, which makes the problem more challenging, since solid compounds are harder to identify and segment.

摘要

在科技文本的语境中,意义通常蕴含在名词复合词中,而这些复合词的语义解释涉及对复合词成分之间关系的检测和语义分类。语义关系挖掘是一种用于标记、解释、提取和分类词对之间关系的技术,是一项重要的工作,有助于深化诸如信息提取、问答、摘要、释义、本体构建和文本蕴含等文档理解技术。本文探讨了将取自多语言医学词库的语义描述符应用于从大型瑞典医学语料库中抽取的大量固态(封闭形式)复合词样本,并确定复合词成分之间可能存在的关系。我们的工作受到该领域先前研究的启发,即利用词汇层次结构来识别医学领域中双词名词复合词之间的关系。与先前的研究不同,瑞典语和其他日耳曼语一样,需要进一步的分析方法,因为复合词写成一个序列,词与词之间没有空格,例如“virus diseases”(病毒疾病)对应“virussjukdomar”,这使得问题更具挑战性,因为固态复合词更难识别和切分。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验