Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT), Im Neuenheimer Feld 280, Heidelberg, 69120, Germany.
BMC Genomics. 2024 Sep 16;25(1):869. doi: 10.1186/s12864-024-10759-4.
Bio-ontologies are keys in structuring complex biological information for effective data integration and knowledge representation. Semantic similarity analysis on bio-ontologies quantitatively assesses the degree of similarity between biological concepts based on the semantics encoded in ontologies. It plays an important role in structured and meaningful interpretations and integration of complex data from multiple biological domains.
We present simona, a novel R package for semantic similarity analysis on general bio-ontologies. Simona implements infrastructures for ontology analysis by offering efficient data structures, fast ontology traversal methods, and elegant visualizations. Moreover, it provides a robust toolbox supporting over 70 methods for semantic similarity analysis. With simona, we conducted a benchmark against current semantic similarity methods. The results demonstrate methods are clustered based on their mathematical methodologies, thus guiding researchers in the selection of appropriate methods. Additionally, we explored annotation-based versus topology-based methods, revealing that semantic similarities solely based on ontology topology can efficiently reveal semantic similarity structures, facilitating analysis on less-studied organisms and other ontologies.
Simona offers a versatile interface and efficient implementation for processing, visualization, and semantic similarity analysis on bio-ontologies. We believe that simona will serve as a robust tool for uncovering relationships and enhancing the interoperability of biological knowledge systems.
生物本体是为有效数据集成和知识表示而构建复杂生物信息的关键。生物本体上的语义相似性分析定量评估了基于本体中编码的语义的生物概念之间的相似程度。它在从多个生物领域的复杂数据的结构化和有意义的解释和整合方面发挥着重要作用。
我们提出了 simona,这是一个用于一般生物本体的语义相似性分析的新的 R 包。Simona 通过提供高效的数据结构、快速的本体遍历方法和优雅的可视化,为本体分析提供了基础设施。此外,它还提供了一个强大的工具箱,支持超过 70 种语义相似性分析方法。使用 simona,我们对当前的语义相似性方法进行了基准测试。结果表明,方法是根据其数学方法进行聚类的,从而指导研究人员选择合适的方法。此外,我们还探索了基于注释和基于拓扑的方法,结果表明,仅基于本体拓扑的语义相似性可以有效地揭示语义相似性结构,有助于分析研究较少的生物体和其他本体。
Simona 为生物本体的处理、可视化和语义相似性分析提供了通用的接口和高效的实现。我们相信,simona 将成为揭示关系和增强生物知识系统互操作性的强大工具。