Suppr超能文献

利用改进的双重标签传播框架优先考虑疾病基因。

Prioritizing disease genes with an improved dual label propagation framework.

机构信息

College of Software, Nankai University, TianJin, 300350, China.

School of Computer Science and Information Engineering, Tianjin University of Science and Technology, TianJin, 300222, China.

出版信息

BMC Bioinformatics. 2018 Feb 8;19(1):47. doi: 10.1186/s12859-018-2040-6.

Abstract

BACKGROUND

Prioritizing disease genes is trying to identify potential disease causing genes for a given phenotype, which can be applied to reveal the inherited basis of human diseases and facilitate drug development. Our motivation is inspired by label propagation algorithm and the false positive protein-protein interactions that exist in the dataset. To the best of our knowledge, the false positive protein-protein interactions have not been considered before in disease gene prioritization. Label propagation has been successfully applied to prioritize disease causing genes in previous network-based methods. These network-based methods use basic label propagation, i.e. random walk, on networks to prioritize disease genes in different ways. However, all these methods can not deal with the situation in which plenty false positive protein-protein interactions exist in the dataset, because the PPI network is used as a fixed input in previous methods. This important characteristic of data source may cause a large deviation in results.

RESULTS

A novel network-based framework IDLP is proposed to prioritize candidate disease genes. IDLP effectively propagates labels throughout the PPI network and the phenotype similarity network. It avoids the method falling when few disease genes are known. Meanwhile, IDLP models the bias caused by false positive protein interactions and other potential factors by treating the PPI network matrix and the phenotype similarity matrix as the matrices to be learnt. By amending the noises in training matrices, it improves the performance results significantly. We conduct extensive experiments over OMIM datasets, and IDLP has demonstrated its effectiveness compared with eight state-of-the-art approaches. The robustness of IDLP is also validated by doing experiments with disturbed PPI network. Furthermore, We search the literatures to verify the predicted new genes got by IDLP are associated with the given diseases, the high prediction accuracy shows IDLP can be a powerful tool to help biologists discover new disease genes.

CONCLUSIONS

IDLP model is an effective method for disease gene prioritization, particularly for querying phenotypes without known associated genes, which would be greatly helpful for identifying disease genes for less studied phenotypes.

AVAILABILITY

https://github.com/nkiip/IDLP.

摘要

背景

优先考虑疾病基因是试图为特定表型识别潜在的致病基因,这可用于揭示人类疾病的遗传基础并促进药物开发。我们的动机受到标签传播算法和数据集中存在的假阳性蛋白质相互作用的启发。据我们所知,在疾病基因优先排序中,假阳性蛋白质相互作用尚未被考虑。标签传播已成功应用于先前基于网络的方法中,对致病基因进行优先级排序。这些基于网络的方法使用基本的标签传播,即随机游走,以不同的方式在网络上对疾病基因进行优先级排序。但是,所有这些方法都无法处理数据集中存在大量假阳性蛋白质相互作用的情况,因为在以前的方法中,PPI 网络被用作固定输入。这种数据源的重要特征可能会导致结果产生较大偏差。

结果

提出了一种新颖的基于网络的框架 IDLP 来对候选疾病基因进行优先级排序。IDLP 有效地在 PPI 网络和表型相似性网络中传播标签。它避免了在已知疾病基因较少的情况下方法失效的问题。同时,IDLP 通过将 PPI 网络矩阵和表型相似性矩阵视为要学习的矩阵,来对假阳性蛋白质相互作用和其他潜在因素引起的偏差进行建模。通过修正训练矩阵中的噪声,显著提高了性能结果。我们在 OMIM 数据集上进行了广泛的实验,IDLP 与八种最先进的方法相比证明了其有效性。通过对受干扰的 PPI 网络进行实验,验证了 IDLP 的稳健性。此外,我们通过搜索文献来验证 IDLP 预测的新基因与给定疾病之间的关联,高预测准确性表明 IDLP 可以成为帮助生物学家发现新疾病基因的有力工具。

结论

IDLP 模型是疾病基因优先排序的有效方法,特别是对于查询尚无已知相关基因的表型,这对于识别研究较少的表型的疾病基因将有很大帮助。

可用性

https://github.com/nkiip/IDLP。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f90/5806269/e13e95088d48/12859_2018_2040_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验