Suppr超能文献

基于涵摄的子术语推理框架来审计基因本体论。

SSIF: Subsumption-based Sub-term Inference Framework to audit Gene Ontology.

机构信息

School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA.

Department of Computer Science.

出版信息

Bioinformatics. 2020 May 1;36(10):3207-3214. doi: 10.1093/bioinformatics/btaa106.

Abstract

MOTIVATION

The Gene Ontology (GO) is the unifying biological vocabulary for codifying, managing and sharing biological knowledge. Quality issues in GO, if not addressed, can cause misleading results or missed biological discoveries. Manual identification of potential quality issues in GO is a challenging and arduous task, given its growing size. We introduce an automated auditing approach for suggesting potentially missing is-a relations, which may further reveal erroneous is-a relations.

RESULTS

We developed a Subsumption-based Sub-term Inference Framework (SSIF) by leveraging a novel term-algebra on top of a sequence-based representation of GO concepts along with three conditional rules (monotonicity, intersection and sub-concept rules). Applying SSIF to the October 3, 2018 release of GO suggested 1938 unique potentially missing is-a relations. Domain experts evaluated a random sample of 210 potentially missing is-a relations. The results showed SSIF achieved a precision of 60.61, 60.49 and 46.03% for the monotonicity, intersection and sub-concept rules, respectively.

AVAILABILITY AND IMPLEMENTATION

SSIF is implemented in Java. The source code is available at https://github.com/rashmie/SSIF.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

基因本体论 (GO) 是用于编码、管理和共享生物学知识的统一生物学词汇。如果不解决 GO 中的质量问题,可能会导致误导性的结果或错过生物学发现。鉴于其不断增长的规模,手动识别 GO 中的潜在质量问题是一项具有挑战性和艰巨的任务。我们引入了一种自动化的审核方法,用于建议潜在缺失的“is-a”关系,这可能进一步揭示错误的“is-a”关系。

结果

我们通过利用基于序列的 GO 概念表示形式以及三个条件规则(单调性、交集和子概念规则)之上的新术语代数,开发了基于包含的子术语推断框架 (SSIF)。将 SSIF 应用于 2018 年 10 月 3 日发布的 GO 版本,建议了 1938 个独特的潜在缺失的“is-a”关系。领域专家评估了 210 个潜在缺失的“is-a”关系的随机样本。结果表明,SSIF 在单调性、交集和子概念规则方面的精度分别为 60.61%、60.49%和 46.03%。

可用性和实现

SSIF 是用 Java 实现的。源代码可在 https://github.com/rashmie/SSIF 上获得。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7af5/7214018/a52f33e8e9d5/btaa106f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验