Facultad de Informática, Campus de Espinardo, Universidad de Murcia, 30100 Murcia, Spain.
School of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, United Kingdom.
Artif Intell Med. 2015 Sep;65(1):35-48. doi: 10.1016/j.artmed.2014.09.003. Epub 2014 Nov 11.
The main goal of this work is to measure how lexical regularities in biomedical ontology labels can be used for the automatic creation of formal relationships between classes, and to evaluate the results of applying our approach to the Gene Ontology (GO).
In recent years, we have developed a method for the lexical analysis of regularities in biomedical ontology labels, and we showed that the labels can present a high degree of regularity. In this work, we extend our method with a cross-products extension (CPE) metric, which estimates the potential interest of a specific regularity for axiomatic enrichment in the lexical analysis, using information on exact matches in external ontologies. The GO consortium recently enriched the GO by using so-called cross-product extensions. Cross-products are generated by establishing axioms that relate a given GO class with classes from the GO or other biomedical ontologies. We apply our method to the GO and study how its lexical analysis can identify and reconstruct the cross-products that are defined by the GO consortium.
The label of the classes of the GO are highly regular in lexical terms, and the exact matches with labels of external ontologies affect 80% of the GO classes. The CPE metric reveals that 31.48% of the classes that exhibit regularities have fragments that are classes into two external ontologies that are selected for our experiment, namely, the Cell Ontology and the Chemical Entities of Biological Interest ontology, and 18.90% of them are fully decomposable into smaller parts. Our results show that the CPE metric permits our method to detect GO cross-product extensions with a mean recall of 62% and a mean precision of 28%. The study is completed with an analysis of false positives to explain this precision value.
We think that our results support the claim that our lexical approach can contribute to the axiomatic enrichment of biomedical ontologies and that it can provide new insights into the engineering of biomedical ontologies.
本研究的主要目的是衡量生物医学本体标签中的词汇规律如何用于自动创建类之间的形式关系,并评估我们的方法在基因本体(GO)中的应用结果。
近年来,我们开发了一种用于生物医学本体标签词汇规律分析的方法,并表明标签可以呈现出高度的规律性。在这项工作中,我们使用交叉乘积扩展(CPE)度量扩展了我们的方法,该度量使用外部本体中精确匹配的信息来估计词汇分析中特定规律对公理丰富的潜在兴趣。GO 联盟最近使用所谓的交叉乘积扩展来丰富 GO。交叉乘积是通过建立公理来生成的,这些公理将给定的 GO 类与 GO 或其他生物医学本体中的类相关联。我们将我们的方法应用于 GO,并研究其词汇分析如何识别和重建 GO 联盟定义的交叉乘积。
GO 类的标签在词汇方面具有高度的规律性,并且与外部本体的标签的精确匹配影响了 80%的 GO 类。CPE 度量揭示了 31.48%表现出规律性的类具有与我们实验中选择的两个外部本体的类相匹配的片段,即细胞本体和化学实体的生物学兴趣本体,其中 18.90%的类可以完全分解成更小的部分。我们的结果表明,CPE 度量允许我们的方法以 62%的平均召回率和 28%的平均精度来检测 GO 交叉乘积扩展。该研究通过对假阳性的分析来解释这个精度值。
我们认为,我们的结果支持了我们的词汇方法可以为生物医学本体的公理丰富做出贡献的观点,并且可以为生物医学本体的工程提供新的见解。