Suppr超能文献

基于节点核的疾病基因优先级推断的异质网络整合。

Heterogeneous networks integration for disease-gene prioritization with node kernels.

机构信息

Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany.

Department of Mathematics, University of Padova, Padua, Italy.

出版信息

Bioinformatics. 2020 May 1;36(9):2649-2656. doi: 10.1093/bioinformatics/btaa008.

Abstract

MOTIVATION

The identification of disease-gene associations is a task of fundamental importance in human health research. A typical approach consists in first encoding large gene/protein relational datasets as networks due to the natural and intuitive property of graphs for representing objects' relationships and then utilizing graph-based techniques to prioritize genes for successive low-throughput validation assays. Since different types of interactions between genes yield distinct gene networks, there is the need to integrate different heterogeneous sources to improve the reliability of prioritization systems.

RESULTS

We propose an approach based on three phases: first, we merge all sources in a single network, then we partition the integrated network according to edge density introducing a notion of edge type to distinguish the parts and finally, we employ a novel node kernel suitable for graphs with typed edges. We show how the node kernel can generate a large number of discriminative features that can be efficiently processed by linear regularized machine learning classifiers. We report state-of-the-art results on 12 disease-gene associations and on a time-stamped benchmark containing 42 newly discovered associations.

AVAILABILITY AND IMPLEMENTATION

Source code: https://github.com/dinhinfotech/DiGI.git.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

鉴定疾病-基因关联是人类健康研究中一项至关重要的任务。一种典型的方法是首先将大型基因/蛋白质关系数据集编码为网络,因为图具有表示对象关系的自然和直观性质,然后利用基于图的技术对基因进行优先级排序,以便进行后续的低通量验证实验。由于基因之间的不同类型的相互作用产生不同的基因网络,因此需要整合不同的异构源以提高优先级排序系统的可靠性。

结果

我们提出了一种基于三个阶段的方法:首先,我们将所有来源合并到一个单一的网络中,然后根据边缘密度将集成网络划分为不同的部分,引入边缘类型的概念来区分这些部分,最后,我们采用一种新的节点核函数,适用于具有类型化边缘的图。我们展示了节点核函数如何生成大量可通过线性正则化机器学习分类器有效处理的有区别的特征。我们在 12 个疾病-基因关联和一个包含 42 个新发现关联的时间戳基准上报告了最先进的结果。

可用性和实现

源代码:https://github.com/dinhinfotech/DiGI.git。

补充信息

补充数据可在 Bioinformatics 在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验