Nazarenko A, Zweigenbaum P, Bouaud J, Habert B
Laboratoire d'Informatique de Paris-Nord, Université Paris 13.
Proc AMIA Annu Fall Symp. 1997:585-9.
Medical Language Processing (MLP), especially in specific domains, requires fine-grained semantic lexica. We examine whether robust natural language processing tools used on a representative corpus of a domain help in building and refining a semantic categorization. We test this hypothesis with ZELLIG, a corpus analysis tool. The first clusters we obtain are consistent with a model of the domain, as found in the SNOMED nomenclature. They correspond to coarse-grained semantic categories, but isolate as well lexical idiosyncrasies belonging to the clinical sub-language. Moreover, they help categorize additional words.
医学语言处理(MLP),尤其是在特定领域,需要细粒度的语义词典。我们研究在一个领域的代表性语料库上使用强大的自然语言处理工具是否有助于构建和完善语义分类。我们使用语料库分析工具ZELLIG来检验这一假设。我们得到的第一批聚类与该领域的一个模型一致,如在SNOMED术语表中所发现的。它们对应于粗粒度的语义类别,但也分离出了属于临床子语言的词汇特性。此外,它们有助于对其他单词进行分类。