Suppr超能文献

从表型、功能和表达的解剖部位预测候选基因。

Predicting candidate genes from phenotypes, functions and anatomical site of expression.

机构信息

Computational Bioscience Research Center (CBRC), Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia.

Computer Science Department, College of Computers and Information Technology, Taif University, Taif 26571, Saudi Arabia.

出版信息

Bioinformatics. 2021 May 5;37(6):853-860. doi: 10.1093/bioinformatics/btaa879.

Abstract

MOTIVATION

Over the past years, many computational methods have been developed to incorporate information about phenotypes for disease-gene prioritization task. These methods generally compute the similarity between a patient's phenotypes and a database of gene-phenotype to find the most phenotypically similar match. The main limitation in these methods is their reliance on knowledge about phenotypes associated with particular genes, which is not complete in humans as well as in many model organisms, such as the mouse and fish. Information about functions of gene products and anatomical site of gene expression is available for more genes and can also be related to phenotypes through ontologies and machine-learning models.

RESULTS

We developed a novel graph-based machine-learning method for biomedical ontologies, which is able to exploit axioms in ontologies and other graph-structured data. Using our machine-learning method, we embed genes based on their associated phenotypes, functions of the gene products and anatomical location of gene expression. We then develop a machine-learning model to predict gene-disease associations based on the associations between genes and multiple biomedical ontologies, and this model significantly improves over state-of-the-art methods. Furthermore, we extend phenotype-based gene prioritization methods significantly to all genes, which are associated with phenotypes, functions or site of expression.

AVAILABILITY AND IMPLEMENTATION

Software and data are available at https://github.com/bio-ontology-research-group/DL2Vec.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在过去的几年中,已经开发出许多计算方法来将表型信息纳入疾病基因优先排序任务中。这些方法通常计算患者表型与基因-表型数据库之间的相似性,以找到最表型相似的匹配。这些方法的主要局限性在于它们依赖于与特定基因相关的表型知识,而这些知识在人类以及许多模型生物(如小鼠和鱼类)中并不完整。基因产物的功能信息和基因表达的解剖部位信息可用于更多的基因,并且也可以通过本体论和机器学习模型与表型相关联。

结果

我们开发了一种新颖的基于图的机器学习方法,用于生物医学本体论,能够利用本体论中的公理和其他图结构数据。使用我们的机器学习方法,我们根据相关表型、基因产物的功能和基因表达的解剖位置来嵌入基因。然后,我们开发了一种基于基因与多个生物医学本体论之间的关联来预测基因-疾病关联的机器学习模型,该模型明显优于最新方法。此外,我们将基于表型的基因优先排序方法显著扩展到所有与表型、功能或表达部位相关的基因。

可用性和实现

软件和数据可在 https://github.com/bio-ontology-research-group/DL2Vec 上获得。

补充信息

补充数据可在生物信息学在线获得。

相似文献

引用本文的文献

本文引用的文献

5
Expression Atlas update: from tissues to single cells.表达图谱更新:从组织到单细胞。
Nucleic Acids Res. 2020 Jan 8;48(D1):D77-D83. doi: 10.1093/nar/gkz947.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验