Couto Francisco M, Silva Mário J
Departamento de Informática, Faculdade de Ciências da Universidade de Lisboa, Lisboa, 1749-016, Portugal.
J Biomed Semantics. 2011 Aug 31;2:5. doi: 10.1186/2041-1480-2-5.
The large-scale effort in developing, maintaining and making biomedical ontologies available motivates the application of similarity measures to compare ontology concepts or, by extension, the entities described therein. A common approach, known as semantic similarity, compares ontology concepts through the information content they share in the ontology. However, different disjunctive ancestors in the ontology are frequently neglected, or not properly explored, by semantic similarity measures.
This paper proposes a novel method, dubbed DiShIn, that effectively exploits the multiple inheritance relationships present in many biomedical ontologies. DiShIn calculates the shared information content of two ontology concepts, based on the information content of the disjunctive common ancestors of the concepts being compared. DiShIn identifies these disjunctive ancestors through the number of distinct paths from the concepts to their common ancestors.
DiShIn was applied to Gene Ontology and its performance was evaluated against state-of-the-art measures using CESSM, a publicly available evaluation platform of protein similarity measures. By modifying the way traditional semantic similarity measures calculate the shared information content, DiShIn was able to obtain a statistically significant higher correlation between semantic and sequence similarity. Moreover, the incorporation of DiShIn in existing applications that exploit multiple inheritance would reduce their execution time.
开发、维护和提供生物医学本体的大规模努力促使应用相似性度量来比较本体概念,或者进而比较其中描述的实体。一种常见的方法,即语义相似性,通过本体中共享的信息内容来比较本体概念。然而,本体中不同的析取祖先经常被语义相似性度量忽略或未得到恰当探究。
本文提出了一种名为DiShIn的新方法,该方法有效地利用了许多生物医学本体中存在的多重继承关系。DiShIn基于被比较概念的析取共同祖先的信息内容来计算两个本体概念的共享信息内容。DiShIn通过从概念到其共同祖先的不同路径数量来识别这些析取祖先。
DiShIn被应用于基因本体,并使用CESSM(一个公开可用的蛋白质相似性度量评估平台)与现有最先进的度量方法进行性能评估。通过修改传统语义相似性度量计算共享信息内容的方式,DiShIn能够在语义相似性和序列相似性之间获得统计学上显著更高的相关性。此外,将DiShIn纳入利用多重继承的现有应用中会减少其执行时间。