Departamento de Informatica y Sistemas, University of Murcia, CEIR Campus Mare Nostrum, IMIB-Arrixaca, Campus de Espinardo, 30100 Murcia, Spain.
Institute of Medical Informatics, Statistics and Documentation, Medical University of Graz, Auenbruggerpl. 2, Graz, Austria.
Biochim Biophys Acta Gene Regul Mech. 2021 Nov-Dec;1864(11-12):194766. doi: 10.1016/j.bbagrm.2021.194766. Epub 2021 Oct 25.
Gene regulation computational research requires handling and integrating large amounts of heterogeneous data. The Gene Ontology has demonstrated that ontologies play a fundamental role in biological data interoperability and integration. Ontologies help to express data and knowledge in a machine processable way, which enables complex querying and advanced exploitation of distributed data. Contributing to improve data interoperability in gene regulation is a major objective of the GREEKC Consortium, which aims to develop a standardized gene regulation knowledge commons. GREEKC proposes the use of ontologies and semantic tools for developing interoperable gene regulation knowledge models, which should support data annotation. In this work, we study how such knowledge models can be generated from cartoons of gene regulation scenarios. The proposed method consists of generating descriptions in natural language of the cartoons; extracting the entities from the texts; finding those entities in existing ontologies to reuse as much content as possible, especially from well known and maintained ontologies such as the Gene Ontology, the Sequence Ontology, the Relations Ontology and ChEBI; and implementation of the knowledge models. The models have been implemented using Protégé, a general ontology editor, and Noctua, the tool developed by the Gene Ontology Consortium for the development of causal activity models to capture more comprehensive annotations of genes and link their activities in a causal framework for Gene Ontology Annotations. We applied the method to two gene regulation scenarios and illustrate how to apply the models generated to support the annotation of data from research articles.
基因调控的计算研究需要处理和整合大量异构数据。本体论已经证明,本体论在生物数据互操作性和集成方面起着基础性作用。本体论有助于以机器可处理的方式表达数据和知识,从而实现复杂的查询和对分布式数据的高级利用。改善基因调控数据互操作性是 GREEKC 联盟的主要目标之一,该联盟旨在开发一个标准化的基因调控知识库。GREEKC 提议使用本体论和语义工具来开发可互操作的基因调控知识模型,该模型应支持数据注释。在这项工作中,我们研究了如何从基因调控场景的漫画中生成此类知识模型。该方法包括:生成漫画的自然语言描述;从文本中提取实体;在现有本体论中找到这些实体,以尽可能多地重用内容,特别是来自知名且维护良好的本体论,如基因本体论、序列本体论、关系本体论和 ChEBI;以及实现知识模型。模型是使用 Protégé(一种通用本体编辑器)和 Noctua(基因本体论联盟开发的用于开发因果活动模型的工具)实现的,用于捕获基因更全面的注释,并在因果框架中链接它们的活动,以用于基因本体论注释。我们将该方法应用于两个基因调控场景,并说明了如何应用生成的模型来支持研究文章中数据的注释。