Alm Rebekka, Waltemath Dagmar, Wolfien Markus, Wolkenhauer Olaf, Henkel Ron
Department of Multimedia Communication, University of Rostock, Joachim-Jungius-Str. 11, Rostock, 18051 Germany ; Fraunhofer Institute for Computer Graphics Research IGD, Joachim-Jungius-Str. 11, Rostock, 18059 Germany.
Department of Systems Biology and Bioinformatics, University of Rostock, Ulmenstr. 69, Rostock, 18051 Germany.
J Biomed Semantics. 2015 Apr 15;6:20. doi: 10.1186/s13326-015-0014-4. eCollection 2015.
Model repositories such as BioModels Database provide computational models of biological systems for the scientific community. These models contain rich semantic annotations that link model entities to concepts in well-established bio-ontologies such as Gene Ontology. Consequently, thematically similar models are likely to share similar annotations. Based on this assumption, we argue that semantic annotations are a suitable tool to characterize sets of models. These characteristics improve model classification, allow to identify additional features for model retrieval tasks, and enable the comparison of sets of models.
In this paper we discuss four methods for annotation-based feature extraction from model sets. We tested all methods on sets of models in SBML format which were composed from BioModels Database. To characterize each of these sets, we analyzed and extracted concepts from three frequently used ontologies, namely Gene Ontology, ChEBI and SBO. We find that three out of the methods are suitable to determine characteristic features for arbitrary sets of models: The selected features vary depending on the underlying model set, and they are also specific to the chosen model set. We show that the identified features map on concepts that are higher up in the hierarchy of the ontologies than the concepts used for model annotations. Our analysis also reveals that the information content of concepts in ontologies and their usage for model annotation do not correlate.
Annotation-based feature extraction enables the comparison of model sets, as opposed to existing methods for model-to-keyword comparison, or model-to-model comparison.
诸如生物模型数据库之类的模型存储库为科学界提供生物系统的计算模型。这些模型包含丰富的语义注释,可将模型实体与诸如基因本体等成熟生物本体中的概念相链接。因此,主题相似的模型可能会共享相似的注释。基于此假设,我们认为语义注释是表征模型集的合适工具。这些特征可改善模型分类,有助于识别模型检索任务的其他特征,并能够比较模型集。
在本文中,我们讨论了从模型集中基于注释进行特征提取的四种方法。我们在由生物模型数据库组成的SBML格式的模型集上测试了所有方法。为了表征这些集合中的每一个,我们从三个常用本体(即基因本体、化学实体登记号和系统生物学本体)中分析并提取了概念。我们发现其中三种方法适用于确定任意模型集的特征:所选特征因基础模型集而异,并且它们也特定于所选模型集。我们表明,所识别的特征映射到本体层次结构中比用于模型注释的概念更高层次的概念上。我们的分析还表明,本体中概念的信息内容与其在模型注释中的使用不相关。
与现有的模型到关键词比较或模型到模型比较方法相反,基于注释的特征提取能够比较模型集。