Suppr超能文献

OPA2Vec:结合生物医学本体的正式和非正式内容以改进基于相似度的预测。

OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction.

机构信息

Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

出版信息

Bioinformatics. 2019 Jun 1;35(12):2133-2140. doi: 10.1093/bioinformatics/bty933.

Abstract

MOTIVATION

Ontologies are widely used in biology for data annotation, integration and analysis. In addition to formally structured axioms, ontologies contain meta-data in the form of annotation axioms which provide valuable pieces of information that characterize ontology classes. Annotation axioms commonly used in ontologies include class labels, descriptions or synonyms. Despite being a rich source of semantic information, the ontology meta-data are generally unexploited by ontology-based analysis methods such as semantic similarity measures.

RESULTS

We propose a novel method, OPA2Vec, to generate vector representations of biological entities in ontologies by combining formal ontology axioms and annotation axioms from the ontology meta-data. We apply a Word2Vec model that has been pre-trained on either a corpus or abstracts or full-text articles to produce feature vectors from our collected data. We validate our method in two different ways: first, we use the obtained vector representations of proteins in a similarity measure to predict protein-protein interaction on two different datasets. Second, we evaluate our method on predicting gene-disease associations based on phenotype similarity by generating vector representations of genes and diseases using a phenotype ontology, and applying the obtained vectors to predict gene-disease associations using mouse model phenotypes. We demonstrate that OPA2Vec significantly outperforms existing methods for predicting gene-disease associations. Using evidence from mouse models, we apply OPA2Vec to identify candidate genes for several thousand rare and orphan diseases. OPA2Vec can be used to produce vector representations of any biomedical entity given any type of biomedical ontology.

AVAILABILITY AND IMPLEMENTATION

https://github.com/bio-ontology-research-group/opa2vec.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

本体在生物学中被广泛用于数据注释、集成和分析。除了形式化的公理外,本体还包含元数据形式的注释公理,这些公理提供了有价值的信息,用于描述本体类。注释公理通常包括类标签、描述或同义词,它们是本体中常用的信息。尽管本体的元数据是语义信息的丰富来源,但本体分析方法(如语义相似性度量)通常不会利用这些元数据。

结果

我们提出了一种新的方法 OPA2Vec,通过结合本体的形式化公理和元数据中的注释公理,生成本体中生物实体的向量表示。我们使用已经在语料库、摘要或全文文章上进行预训练的 Word2Vec 模型,从我们收集的数据中生成特征向量。我们通过两种不同的方式验证我们的方法:首先,我们使用蛋白质的获得的向量表示在相似性度量中预测两个不同数据集上的蛋白质-蛋白质相互作用。其次,我们基于表型相似性,使用表型本体生成基因和疾病的向量表示,并应用获得的向量来预测基因-疾病关联,以此评估我们的方法在预测基因-疾病关联方面的性能。我们证明了 OPA2Vec 在预测基因-疾病关联方面显著优于现有方法。利用来自小鼠模型的证据,我们应用 OPA2Vec 来识别数千种罕见和孤儿疾病的候选基因。OPA2Vec 可以用于为任何类型的生物医学本体生成任何生物医学实体的向量表示。

可用性和实现

https://github.com/bio-ontology-research-group/opa2vec。

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验