• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GO 关系的自预测有助于其质量审核。

Self-prediction of relations in GO facilitates its quality auditing.

机构信息

School of Computer Science, University of South China, Hengyang, Hunan, 421001, China.

School of Computer Science, University of South China, Hengyang, Hunan, 421001, China.

出版信息

J Biomed Inform. 2023 Aug;144:104441. doi: 10.1016/j.jbi.2023.104441. Epub 2023 Jul 10.

DOI:10.1016/j.jbi.2023.104441
PMID:37437682
Abstract

As applications of the gene ontology (GO) increase rapidly in the biomedical field, quality auditing of it is becoming more and more important. Existing auditing methods are mostly based on rules, observed patterns or hypotheses. In this study, we propose a machine-learning-based framework for GO to audit itself: we first predict the IS-A relations among concepts in GO, then use differences between predicted results and existing relations to uncover potential errors. Specifically, we transfer the taxonomy of GO 2020 January release into a dataset with concept pairs as items and relations between them as labels(pairs with no direct IS-A relation are labeled as ndrs). To fully obtain the representation of each pair, we integrate the embeddings for the concept name, concept definition, as well as concept node in a substring-based topological graph. We divide the dataset into 10 parts, and rotate over all the parts by choosing one part as the testing set and the remaining as the training set each time. After 10 rotations, the prediction model predicted 4,640 existing IS-A pairs as ndrs. In the GO 2022 March release, 340 of these predictions were validated, demonstrating significance with a p-value of 1.60e-46 when compared to the results of randomly selected pairs. On the other hand, the model predicted 2,840 out of 17,079 selected ndrs in GO to be IS-A's relations. After deleting those that caused redundancies and circles, 924 predicted IS-A's relations remained. Among 200 pairs randomly selected, 30 were validated as missing IS-A's by domain experts. In conclusion, this study investigates a novel way of auditing biomedical ontologies by predicting the relations in it, which was shown to be useful for discovering potential errors.

摘要

随着基因本体论(GO)在生物医学领域的应用迅速增加,对其进行质量审核变得越来越重要。现有的审核方法主要基于规则、观察模式或假设。在这项研究中,我们提出了一种基于机器学习的 GO 自我审核框架:我们首先预测 GO 中概念之间的 IS-A 关系,然后使用预测结果与现有关系之间的差异来发现潜在的错误。具体来说,我们将 GO 2020 年 1 月版的分类法转换为一个数据集,其中概念对作为项目,它们之间的关系作为标签(没有直接 IS-A 关系的对标记为 ndrs)。为了充分获取每一对的表示,我们将概念名称、概念定义以及基于子字符串拓扑图中的概念节点的嵌入集成在一起。我们将数据集分为 10 部分,每次选择一部分作为测试集,其余部分作为训练集,在 10 次旋转后,预测模型预测了 4640 个现有的 IS-A 对为 ndrs。在 GO 2022 年 3 月版中,对这些预测中的 340 个进行了验证,与随机选择的对相比,p 值为 1.60e-46,具有显著意义。另一方面,模型预测了 GO 中 17079 个选定的 ndrs 中的 2840 个是 IS-A 的关系。删除那些导致冗余和循环的关系后,剩下 924 个预测的 IS-A 关系。在随机选择的 200 对中,有 30 对被领域专家验证为缺失的 IS-A 关系。总之,这项研究通过预测其中的关系,探讨了一种审核生物医学本体的新方法,结果表明该方法有助于发现潜在的错误。

相似文献

1
Self-prediction of relations in GO facilitates its quality auditing.GO 关系的自预测有助于其质量审核。
J Biomed Inform. 2023 Aug;144:104441. doi: 10.1016/j.jbi.2023.104441. Epub 2023 Jul 10.
2
An evidence-based lexical pattern approach for quality assurance of Gene Ontology relations.基于证据的词汇模式方法,用于保证基因本体论关系的质量。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac122.
3
Matching biomedical ontologies with GCN-based feature propagation.基于图卷积网络特征传播的生物医学本体匹配。
Math Biosci Eng. 2022 Jun 9;19(8):8479-8504. doi: 10.3934/mbe.2022394.
4
Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts.多本体精炼嵌入模型(MORE):一种基于混合多本体和语料库的生物医学概念语义表示模型。
J Biomed Inform. 2020 Nov;111:103581. doi: 10.1016/j.jbi.2020.103581. Epub 2020 Oct 1.
5
SSIF: Subsumption-based Sub-term Inference Framework to audit Gene Ontology.基于涵摄的子术语推理框架来审计基因本体论。
Bioinformatics. 2020 May 1;36(10):3207-3214. doi: 10.1093/bioinformatics/btaa106.
6
Complex overlapping concepts: An effective auditing methodology for families of similarly structured BioPortal ontologies.复杂重叠概念:一种用于具有相似结构的 BioPortal 本体论家族的有效审计方法。
J Biomed Inform. 2018 Jul;83:135-149. doi: 10.1016/j.jbi.2018.05.015. Epub 2018 May 28.
7
An efficient, large-scale, non-lattice-detection algorithm for exhaustive structural auditing of biomedical ontologies.一种用于生物医学本体全面结构审核的高效、大规模、非格检测算法。
J Biomed Inform. 2018 Apr;80:106-119. doi: 10.1016/j.jbi.2018.03.004. Epub 2018 Mar 13.
8
mOWL: Python library for machine learning with biomedical ontologies.mOWL:用于生物医学本体机器学习的 Python 库。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac811.
9
Identification of missing hierarchical relations in the vaccine ontology using acquired term pairs.利用获取的术语对识别疫苗本体中缺失的层次关系。
J Biomed Semantics. 2022 Aug 13;13(1):22. doi: 10.1186/s13326-022-00276-2.
10
Multi-domain knowledge graph embeddings for gene-disease association prediction.多领域知识图谱嵌入在基因-疾病关联预测中的应用。
J Biomed Semantics. 2023 Aug 14;14(1):11. doi: 10.1186/s13326-023-00291-x.