Gan Mingxin, Dou Xue, Jiang Rui
Dongling School of Economics and Management, University of Science and Technology Beijing, Beijing 100083, China.
ScientificWorldJournal. 2013;2013:793091. doi: 10.1155/2013/793091. Epub 2013 Feb 28.
Advances in high-throughput experimental techniques in the past decade have enabled the explosive increase of omics data, while effective organization, interpretation, and exchange of these data require standard and controlled vocabularies in the domain of biological and biomedical studies. Ontologies, as abstract description systems for domain-specific knowledge composition, hence receive more and more attention in computational biology and bioinformatics. Particularly, many applications relying on domain ontologies require quantitative measures of relationships between terms in the ontologies, making it indispensable to develop computational methods for the derivation of ontology-based semantic similarity between terms. Nevertheless, with a variety of methods available, how to choose a suitable method for a specific application becomes a problem. With this understanding, we review a majority of existing methods that rely on ontologies to calculate semantic similarity between terms. We classify existing methods into five categories: methods based on semantic distance, methods based on information content, methods based on properties of terms, methods based on ontology hierarchy, and hybrid methods. We summarize characteristics of each category, with emphasis on basic notions, advantages and disadvantages of these methods. Further, we extend our review to software tools implementing these methods and applications using these methods.
在过去十年中,高通量实验技术的进步使得组学数据呈爆发式增长,而有效组织、解读和交换这些数据需要生物和生物医学研究领域的标准和受控词汇表。本体作为特定领域知识构成的抽象描述系统,因此在计算生物学和生物信息学中受到越来越多的关注。特别是,许多依赖领域本体的应用需要对本体中术语之间的关系进行定量测量,这使得开发用于推导基于本体的术语语义相似度的计算方法变得不可或缺。然而,由于有多种方法可供选择,如何为特定应用选择合适的方法就成了一个问题。基于这种认识,我们回顾了大多数现有的依赖本体来计算术语之间语义相似度的方法。我们将现有方法分为五类:基于语义距离的方法、基于信息内容的方法、基于术语属性的方法、基于本体层次结构的方法和混合方法。我们总结了每一类方法的特点,重点介绍了这些方法的基本概念、优缺点。此外,我们将综述扩展到实现这些方法的软件工具以及使用这些方法的应用。