枚举有向无环图的一致子图：生物医学本体论的一个见解。

Enumerating consistent sub-graphs of directed acyclic graphs: an insight into biomedical ontologies.

机构信息

Department of Computer Science, Indiana University, Bloomington, USA.

出版信息

Bioinformatics. 2018 Jul 1;34(13):i313-i322. doi: 10.1093/bioinformatics/bty268.

DOI:10.1093/bioinformatics/bty268

PMID:29949985

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6022688/

Abstract

MOTIVATION

Modern problems of concept annotation associate an object of interest (gene, individual, text document) with a set of interrelated textual descriptors (functions, diseases, topics), often organized in concept hierarchies or ontologies. Most ontology can be seen as directed acyclic graphs (DAGs), where nodes represent concepts and edges represent relational ties between these concepts. Given an ontology graph, each object can only be annotated by a consistent sub-graph; that is, a sub-graph such that if an object is annotated by a particular concept, it must also be annotated by all other concepts that generalize it. Ontologies therefore provide a compact representation of a large space of possible consistent sub-graphs; however, until now we have not been aware of a practical algorithm that can enumerate such annotation spaces for a given ontology.

RESULTS

We propose an algorithm for enumerating consistent sub-graphs of DAGs. The algorithm recursively partitions the graph into strictly smaller graphs until the resulting graph becomes a rooted tree (forest), for which a linear-time solution is computed. It then combines the tallies from graphs created in the recursion to obtain the final count. We prove the correctness of this algorithm, propose several practical accelerations, evaluate it on random graphs and then apply it to characterize four major biomedical ontologies. We believe this work provides valuable insights into the complexity of concept annotation spaces and its potential influence on the predictability of ontological annotation.

AVAILABILITY AND IMPLEMENTATION

https://github.com/shawn-peng/counting-consistent-sub-DAG.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

现代的概念注释问题将感兴趣的对象（基因、个体、文本文档）与一组相关的文本描述符（功能、疾病、主题）联系起来，这些描述符通常组织在概念层次结构或本体中。大多数本体可以看作是有向无环图（DAG），其中节点表示概念，边表示这些概念之间的关系纽带。给定一个本体图，每个对象只能通过一个一致的子图进行注释；也就是说，一个子图，使得如果一个对象被一个特定的概念注释，它也必须被所有其他概括它的概念注释。本体因此提供了一个紧凑的表示，用于表示一个可能的一致子图的大空间；然而，直到现在，我们还没有意识到一个实用的算法可以为给定的本体枚举这样的注释空间。

结果

我们提出了一种用于枚举 DAG 一致子图的算法。该算法递归地将图划分为严格较小的图，直到得到的图成为一个有根树（森林），对于这种情况，计算出一个线性时间的解决方案。然后，它将递归中创建的图形的计数合并起来，以获得最终的计数。我们证明了该算法的正确性，提出了几种实用的加速方法，在随机图上进行了评估，然后将其应用于描述四个主要的生物医学本体。我们相信这项工作为概念注释空间的复杂性及其对本体注释的可预测性的潜在影响提供了有价值的见解。

可用性和实现

https://github.com/shawn-peng/counting-consistent-sub-DAG。

补充信息

补充数据可在 Bioinformatics 在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d1a/6022688/305f5813d85c/bty268f1.jpg

相似文献

Enumerating consistent sub-graphs of directed acyclic graphs: an insight into biomedical ontologies.

Bioinformatics. 2018 Jul 1;34(13):i313-i322. doi: 10.1093/bioinformatics/bty268.

Bit-parallel sequence-to-graph alignment.

Bioinformatics. 2019 Oct 1;35(19):3599-3607. doi: 10.1093/bioinformatics/btz162.

OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction.

Bioinformatics. 2019 Jun 1;35(12):2133-2140. doi: 10.1093/bioinformatics/bty933.

Inferring ontology graph structures using OWL reasoning.

BMC Bioinformatics. 2018 Jan 5;19(1):7. doi: 10.1186/s12859-017-1999-8.

Automated ontology generation framework powered by linked biomedical ontologies for disease-drug domain.

Comput Methods Programs Biomed. 2018 Oct;165:117-128. doi: 10.1016/j.cmpb.2018.08.010. Epub 2018 Aug 16.

Where to search top-K biomedical ontologies?

Brief Bioinform. 2019 Jul 19;20(4):1477-1491. doi: 10.1093/bib/bby015.

An efficient, large-scale, non-lattice-detection algorithm for exhaustive structural auditing of biomedical ontologies.

J Biomed Inform. 2018 Apr;80:106-119. doi: 10.1016/j.jbi.2018.03.004. Epub 2018 Mar 13.

Elucidating high-dimensional cancer hallmark annotation via enriched ontology.

J Biomed Inform. 2017 Sep;73:84-94. doi: 10.1016/j.jbi.2017.07.011. Epub 2017 Jul 16.

mOWL: Python library for machine learning with biomedical ontologies.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac811.

Disease ontologies for knowledge graphs.

BMC Bioinformatics. 2021 Jul 21;22(1):377. doi: 10.1186/s12859-021-04173-w.

引用本文的文献

Optimizing gene selection and module identification via ontology-based scoring and deep learning.

Bioinform Adv. 2025 Feb 26;5(1):vbaf034. doi: 10.1093/bioadv/vbaf034. eCollection 2025.

Mutual annotation-based prediction of protein domain functions with Domain2GO.

Protein Sci. 2024 Jun;33(6):e4988. doi: 10.1002/pro.4988.

CAFA-evaluator: a Python tool for benchmarking ontological classification methods.

Bioinform Adv. 2024 Mar 14;4(1):vbae043. doi: 10.1093/bioadv/vbae043. eCollection 2024.

The field of protein function prediction as viewed by different domain scientists.

Bioinform Adv. 2022 Aug 17;2(1):vbac057. doi: 10.1093/bioadv/vbac057. eCollection 2022.

Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.

Bioinformatics. 2022 Sep 30;38(19):4488-4496. doi: 10.1093/bioinformatics/btac536.

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa199.

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.

Genome Biol. 2019 Nov 19;20(1):244. doi: 10.1186/s13059-019-1835-8.

本文引用的文献

Methods Mol Biol. 2017;1446:161-173. doi: 10.1007/978-1-4939-3743-1_12.

Community-Wide Evaluation of Computational Function Prediction.

Methods Mol Biol. 2017;1446:133-146. doi: 10.1007/978-1-4939-3743-1_10.

Best Practices in Manual Annotation with the Gene Ontology.

Methods Mol Biol. 2017;1446:41-54. doi: 10.1007/978-1-4939-3743-1_4.

FEDRR: fast, exhaustive detection of redundant hierarchical relations for quality improvement of large biomedical ontologies.

BioData Min. 2016 Oct 10;9:31. doi: 10.1186/s13040-016-0110-8. eCollection 2016.

An expanded evaluation of protein function prediction methods shows an improvement in accuracy.

Genome Biol. 2016 Sep 7;17(1):184. doi: 10.1186/s13059-016-1037-6.

The GOA database: gene Ontology annotation updates for 2015.

Nucleic Acids Res. 2015 Jan;43(Database issue):D1057-63. doi: 10.1093/nar/gku1113. Epub 2014 Nov 6.

The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective.

Bioinformatics. 2014 Sep 1;30(17):i609-16. doi: 10.1093/bioinformatics/btu472.

Variation Ontology for annotation of variation effects and mechanisms.

Genome Res. 2014 Feb;24(2):356-64. doi: 10.1101/gr.157495.113. Epub 2013 Oct 25.

Information-theoretic evaluation of predicted ontological annotations.

Bioinformatics. 2013 Jul 1;29(13):i53-61. doi: 10.1093/bioinformatics/btt228.

Biases in the experimental annotations of protein function and their effect on our understanding of protein function space.

PLoS Comput Biol. 2013;9(5):e1003063. doi: 10.1371/journal.pcbi.1003063. Epub 2013 May 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

枚举有向无环图的一致子图：生物医学本体论的一个见解。

Enumerating consistent sub-graphs of directed acyclic graphs: an insight into biomedical ontologies.

机构信息

Department of Computer Science, Indiana University, Bloomington, USA.