FEDRR：用于大型生物医学本体质量改进的冗余层次关系快速详尽检测

FEDRR: fast, exhaustive detection of redundant hierarchical relations for quality improvement of large biomedical ontologies.

作者信息

Xing Guangming, Zhang Guo-Qiang, Cui Licong

机构信息

Department of Computer Science, Western Kentucky University, Bowling Green, 42101 KY USA.

Institute of Biomedical Informatics, University of Kentucky, Lexington, 40536 KY USA.

出版信息

BioData Min. 2016 Oct 10;9:31. doi: 10.1186/s13040-016-0110-8. eCollection 2016.

DOI:10.1186/s13040-016-0110-8

PMID:27777627

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5057496/

Abstract

BACKGROUND

Redundant hierarchical relations refer to such patterns as two paths from one concept to another, one with length one (direct) and the other with length greater than one (indirect). Each redundant relation represents a possibly unintended defect that needs to be corrected in the ontology quality assurance process. Detecting and eliminating redundant relations would help improve the results of all methods relying on the relevant ontological systems as knowledge source, such as the computation of semantic distance between concepts and for ontology matching and alignment.

RESULTS

This paper introduces a novel and scalable approach, called FEDRR - Fast, Exhaustive Detection of Redundant Relations - for quality assurance work during ontological evolution. FEDRR combines the algorithm ideas of Dynamic Programming with Topological Sort, for exhaustive mining of all redundant hierarchical relations in ontological hierarchies, in (·||+||) time, where || is the number of concepts, || is the number of the relations, and is a constant in practice. Using FEDRR, we performed exhaustive search of all redundant is-a relations in two of the largest ontological systems in biomedicine: SNOMED CT and Gene Ontology (GO). 372 and 1609 redundant is-a relations were found in the 2015-09-01 version of SNOMED CT and 2015-05-01 version of GO, respectively. We have also performed FEDRR on over 190 source vocabularies in the UMLS - a large integrated repository of biomedical ontologies, and identified six sources containing redundant is-a relations. Randomly generated ontologies have also been used to further validate the efficiency of FEDRR.

CONCLUSIONS

FEDRR provides a generally applicable, effective tool for systematic detecting redundant relations in large ontological systems for quality improvement.

摘要

背景

冗余层次关系指的是从一个概念到另一个概念存在两条路径的模式，一条路径长度为一（直接路径），另一条路径长度大于一（间接路径）。每个冗余关系都代表了一个可能意外出现的缺陷，需要在本体质量保证过程中加以纠正。检测并消除冗余关系将有助于提升所有依赖相关本体系统作为知识源的方法的结果，比如概念间语义距离的计算以及本体匹配与对齐。

结果

本文介绍了一种新颖且可扩展的方法，名为FEDRR（快速、详尽的冗余关系检测），用于本体演化过程中的质量保证工作。FEDRR将动态规划的算法思想与拓扑排序相结合，以详尽挖掘本体层次结构中所有冗余层次关系，时间复杂度为(·||+||) ，其中||为概念数量，||为关系数量，在实际应用中是一个常数。使用FEDRR，我们对生物医学领域两个最大的本体系统：SNOMED CT和基因本体（GO）中的所有冗余“是一种”关系进行了详尽搜索。在SNOMED CT的2015 - 09 - 01版本和GO的2015 - 05 - 01版本中，分别发现了372个和1609个冗余“是一种”关系。我们还对统一医学语言系统（UMLS，一个大型生物医学本体集成库）中的190多个源词汇表执行了FEDRR，并识别出六个包含冗余“是一种”关系的源。随机生成的本体也被用于进一步验证FEDRR的效率。

结论

FEDRR为系统检测大型本体系统中的冗余关系以提高质量提供了一个普遍适用且有效的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c96c/5057496/decdcfaee552/13040_2016_110_Fig1_HTML.jpg

相似文献

FEDRR: fast, exhaustive detection of redundant hierarchical relations for quality improvement of large biomedical ontologies.

BioData Min. 2016 Oct 10;9:31. doi: 10.1186/s13040-016-0110-8. eCollection 2016.

COHeRE: Cross-Ontology Hierarchical Relation Examination for Ontology Quality Assurance.

AMIA Annu Symp Proc. 2015 Nov 5;2015:456-65. eCollection 2015.

An efficient, large-scale, non-lattice-detection algorithm for exhaustive structural auditing of biomedical ontologies.

J Biomed Inform. 2018 Apr;80:106-119. doi: 10.1016/j.jbi.2018.03.004. Epub 2018 Mar 13.

A substring replacement approach for identifying missing IS-A relations in SNOMED CT.

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2022 Dec;2022:2611-2618. doi: 10.1109/bibm55620.2022.9995595. Epub 2023 Jan 2.

Large-scale, Exhaustive Lattice-based Structural Auditing of SNOMED CT.

AMIA Annu Symp Proc. 2010 Nov 13;2010:922-6.

Quality Assurance of UMLS Semantic Type Assignments Using SNOMED CT Hierarchies.

Methods Inf Med. 2016;55(2):158-65. doi: 10.3414/ME14-01-0104. Epub 2015 Apr 30.

Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT.

J Am Med Inform Assoc. 2017 Jul 1;24(4):788-798. doi: 10.1093/jamia/ocw175.

Using ontology-based semantic similarity to facilitate the article screening process for systematic reviews.

J Biomed Inform. 2017 May;69:33-42. doi: 10.1016/j.jbi.2017.03.007. Epub 2017 Mar 14.

From lexical regularities to axiomatic patterns for the quality assurance of biomedical terminologies and ontologies.

J Biomed Inform. 2018 Aug;84:59-74. doi: 10.1016/j.jbi.2018.06.008. Epub 2018 Jun 14.

A method exploiting syntactic patterns and the UMLS semantics for aligning biomedical ontologies: the case of OBO disease ontologies.

Int J Med Inform. 2007 Dec;76 Suppl 3:S353-61. doi: 10.1016/j.ijmedinf.2007.03.004. Epub 2007 May 22.

引用本文的文献

An evidence-based lexical pattern approach for quality assurance of Gene Ontology relations.

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac122.

A Comparison of Exhaustive and Non-lattice-based Methods for Auditing Hierarchical Relations in Gene Ontology.

AMIA Annu Symp Proc. 2022 Feb 21;2021:177-186. eCollection 2021.

A review of auditing techniques for the Unified Medical Language System.

J Am Med Inform Assoc. 2020 Oct 1;27(10):1625-1638. doi: 10.1093/jamia/ocaa108.

SSIF: Subsumption-based Sub-term Inference Framework to audit Gene Ontology.

Bioinformatics. 2020 May 1;36(10):3207-3214. doi: 10.1093/bioinformatics/btaa106.

Enumerating consistent sub-graphs of directed acyclic graphs: an insight into biomedical ontologies.

Bioinformatics. 2018 Jul 1;34(13):i313-i322. doi: 10.1093/bioinformatics/bty268.

Spark-MCA: Large-scale, Exhaustive Formal Concept Analysis for Evaluating the Semantic Completeness of SNOMED CT.

AMIA Annu Symp Proc. 2018 Apr 16;2017:1931-1940. eCollection 2017.

本文引用的文献

COHeRE: Cross-Ontology Hierarchical Relation Examination for Ontology Quality Assurance.

AMIA Annu Symp Proc. 2015 Nov 5;2015:456-65. eCollection 2015.

Mining Relation Reversals in the Evolution of SNOMED CT Using MapReduce.

AMIA Jt Summits Transl Sci Proc. 2015 Mar 23;2015:46-50. eCollection 2015.

Identifying redundant and missing relations in the gene ontology.

Stud Health Technol Inform. 2015;210:195-9.

MEDCIS: Multi-Modality Epilepsy Data Capture and Integration System.

AMIA Annu Symp Proc. 2014 Nov 14;2014:1248-57. eCollection 2014.

A Semantic-based Approach for Exploring Consumer Health Questions Using UMLS.

AMIA Annu Symp Proc. 2014 Nov 14;2014:432-41. eCollection 2014.

MaPLE: A MapReduce Pipeline for Lattice-based Evaluation and Its Application to SNOMED CT.

Proc IEEE Int Conf Big Data. 2014 Oct;2014:754-759. doi: 10.1109/BigData.2014.7004301.

Using SPARQL to Test for Lattices: application to quality assurance in biomedical ontologies.

Semant Web ISWC. 2010;6497:273-288. doi: 10.1007/978-3-642-17749-1_18.

A family-based framework for supporting quality assurance of biomedical ontologies in BioPortal.

AMIA Annu Symp Proc. 2013 Nov 16;2013:581-90. eCollection 2013.

Auditing the multiply-related concepts within the UMLS.

J Am Med Inform Assoc. 2014 Oct;21(e2):e185-93. doi: 10.1136/amiajnl-2013-002227. Epub 2014 Jan 24.

COnto-Diff: generation of complex evolution mappings for life science ontologies.

J Biomed Inform. 2013 Feb;46(1):15-32. doi: 10.1016/j.jbi.2012.04.009. Epub 2012 May 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

FEDRR：用于大型生物医学本体质量改进的冗余层次关系快速详尽检测

FEDRR: fast, exhaustive detection of redundant hierarchical relations for quality improvement of large biomedical ontologies.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献