利用自动学习的动词选择偏好对生物医学术语进行分类。

Using automatically learnt verb selectional preferences for classification of biomedical terms.

作者信息

Spasić Irena, Ananiadou Sophia

机构信息

Department of Chemistry, UMIST, Faraday Building, P.O. Box 88, Sackville Street, Manchester M60 1QD, UK.

出版信息

J Biomed Inform. 2004 Dec;37(6):483-97. doi: 10.1016/j.jbi.2004.08.002.

DOI:10.1016/j.jbi.2004.08.002

PMID:15542021

Abstract

In this paper, we present an approach to term classification based on verb selectional patterns (VSPs), where such a pattern is defined as a set of semantic classes that could be used in combination with a given domain-specific verb. VSPs have been automatically learnt based on the information found in a corpus and an ontology in the biomedical domain. Prior to the learning phase, the corpus is terminologically processed: term recognition is performed by both looking up the dictionary of terms listed in the ontology and applying the C/NC-value method for on-the-fly term extraction. Subsequently, domain-specific verbs are automatically identified in the corpus based on the frequency of occurrence and the frequency of their co-occurrence with terms. VSPs are then learnt automatically for these verbs. Two machine learning approaches are presented. The first approach has been implemented as an iterative generalisation procedure based on a partial order relation induced by the domain-specific ontology. The second approach exploits the idea of genetic algorithms. Once the VSPs are acquired, they can be used to classify newly recognised terms co-occurring with domain-specific verbs. Given a term, the most frequently co-occurring domain-specific verb is selected. Its VSP is used to constrain the search space by focusing on potential classes of the given term. A nearest-neighbour approach is then applied to select a class from the constrained space of candidate classes. The most similar candidate class is predicted for the given term. The similarity measure used for this purpose combines contextual, lexical, and syntactic properties of terms.

摘要

在本文中，我们提出了一种基于动词选择模式（VSPs）的术语分类方法，其中这种模式被定义为一组可以与给定领域特定动词结合使用的语义类。VSPs是基于生物医学领域语料库和本体中发现的信息自动学习得到的。在学习阶段之前，对语料库进行术语处理：通过查找本体中列出的术语词典并应用C/NC值方法进行实时术语提取来进行术语识别。随后，根据出现频率及其与术语的共现频率在语料库中自动识别领域特定动词。然后为这些动词自动学习VSPs。提出了两种机器学习方法。第一种方法已实现为基于领域特定本体诱导的偏序关系的迭代泛化过程。第二种方法利用了遗传算法的思想。一旦获得了VSPs，就可以用于对与领域特定动词共现的新识别术语进行分类。给定一个术语，选择与之共现最频繁的领域特定动词。其VSP用于通过关注给定术语的潜在类来约束搜索空间。然后应用最近邻方法从候选类的受限空间中选择一个类。为给定术语预测最相似的候选类。为此目的使用的相似性度量结合了术语的上下文、词汇和句法属性。