Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy.
PROS Research Center, VRAIN Research Institute, Universitat Politècnica de València, Valencia, Spain.
BMC Bioinformatics. 2022 Nov 17;23(Suppl 11):491. doi: 10.1186/s12859-022-05022-0.
Genomics and virology are unquestionably important, but complex, domains being investigated by a large number of scientists. The need to facilitate and support work within these domains requires sharing of databases, although it is often difficult to do so because of the different ways in which data is represented across the databases. To foster semantic interoperability, models are needed that provide a deep understanding and interpretation of the concepts in a domain, so that the data can be consistently interpreted among researchers.
In this research, we propose the use of conceptual models to support semantic interoperability among databases and assess their ontological clarity to support their effective use. This modeling effort is illustrated by its application to the Viral Conceptual Model (VCM) that captures and represents the sequencing of viruses, inspired by the need to understand the genomic aspects of the virus responsible for COVID-19. For achieving semantic clarity on the VCM, we leverage the "ontological unpacking" method, a process of ontological analysis that reveals the ontological foundation of the information that is represented in a conceptual model. This is accomplished by applying the stereotypes of the OntoUML ontology-driven conceptual modeling language.As a result, we propose a new OntoVCM, an ontologically grounded model, based on the initial VCM, but with guaranteed interoperability among the data sources that employ it.
We propose and illustrate how the unpacking of the Viral Conceptual Model resolves several issues related to semantic interoperability, the importance of which is recognized by the "I" in FAIR principles. The research addresses conceptual uncertainty within the domain of SARS-CoV-2 data and knowledge.The method employed provides the basis for further analyses of complex models currently used in life science applications, but lacking ontological grounding, subsequently hindering the interoperability needed for scientists to progress their research.
基因组学和病毒学无疑是非常重要的领域,有大量的科学家在研究这些复杂的领域。为了促进这些领域的工作,需要共享数据库,尽管由于数据库之间数据表示方式的不同,通常很难做到这一点。为了促进语义互操作性,需要模型提供对领域概念的深入理解和解释,以便研究人员能够一致地解释数据。
在这项研究中,我们提出使用概念模型来支持数据库之间的语义互操作性,并评估其本体论清晰度,以支持其有效使用。通过将其应用于病毒概念模型 (VCM) 来说明这种建模工作,该模型捕获并表示病毒的测序,这是为了理解导致 COVID-19 的病毒的基因组方面的需求。为了实现 VCM 的语义清晰度,我们利用了“本体论分解”方法,这是一种本体分析过程,揭示了概念模型中表示的信息的本体基础。这是通过应用 OntoUML 本体驱动的概念建模语言的构造型来实现的。结果,我们提出了一个新的基于初始 VCM 的 OntoVCM,这是一个本体基础模型,但保证了使用它的数据源之间的互操作性。
我们提出并说明了如何通过对病毒概念模型的分解来解决与语义互操作性相关的几个问题,这些问题的重要性在 FAIR 原则的“I”中得到了认可。该研究解决了 SARS-CoV-2 数据和知识领域的概念不确定性。所采用的方法为进一步分析目前在生命科学应用中使用的复杂模型提供了基础,但缺乏本体论基础,从而阻碍了科学家为推进研究所需的互操作性。