Suppr超能文献

探索基于文献的基因本体论自动标注不一致性检测。

Exploring automatic inconsistency detection for literature-based gene ontology annotation.

机构信息

School of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3010, Australia.

School of Computer Technologies, RMIT University, Melbourne, VIC 3000, Australia.

出版信息

Bioinformatics. 2022 Jun 24;38(Suppl 1):i273-i281. doi: 10.1093/bioinformatics/btac230.

Abstract

MOTIVATION

Literature-based gene ontology annotations (GOA) are biological database records that use controlled vocabulary to uniformly represent gene function information that is described in the primary literature. Assurance of the quality of GOA is crucial for supporting biological research. However, a range of different kinds of inconsistencies in between literature as evidence and annotated GO terms can be identified; these have not been systematically studied at record level. The existing manual-curation approach to GOA consistency assurance is inefficient and is unable to keep pace with the rate of updates to gene function knowledge. Automatic tools are therefore needed to assist with GOA consistency assurance. This article presents an exploration of different GOA inconsistencies and an early feasibility study of automatic inconsistency detection.

RESULTS

We have created a reliable synthetic dataset to simulate four realistic types of GOA inconsistency in biological databases. Three automatic approaches are proposed. They provide reasonable performance on the task of distinguishing the four types of inconsistency and are directly applicable to detect inconsistencies in real-world GOA database records. Major challenges resulting from such inconsistencies in the context of several specific application settings are reported. This is the first study to introduce automatic approaches that are designed to address the challenges in current GOA quality assurance workflows. The data underlying this article are available in Github at https://github.com/jiyuc/AutoGOAConsistency.

摘要

动机

基于文献的基因本体论注释 (GOA) 是生物数据库记录,使用受控词汇表统一表示文献中描述的基因功能信息。GOA 的质量保证对于支持生物研究至关重要。然而,在文献作为证据和注释的 GO 术语之间,可以识别出一系列不同类型的不一致;这些不一致尚未在记录级别进行系统研究。现有的 GOA 一致性保证手动策管方法效率低下,无法跟上基因功能知识更新的速度。因此,需要自动工具来协助 GOA 一致性保证。本文探讨了不同的 GOA 不一致,并对自动不一致检测进行了早期可行性研究。

结果

我们创建了一个可靠的合成数据集,以模拟生物数据库中四种现实的 GOA 不一致类型。提出了三种自动方法。它们在区分四种不一致类型的任务上提供了合理的性能,并且可以直接应用于检测真实世界的 GOA 数据库记录中的不一致。报告了在几个特定应用场景下,由于这种不一致而产生的主要挑战。这是首次引入旨在解决当前 GOA 质量保证工作流程中的挑战的自动方法的研究。本文所依据的数据可在 Github 上获得,网址为 https://github.com/jiyuc/AutoGOAConsistency。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6297/9235499/e580319649e4/btac230f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验