利用改进的双重标签传播框架优先考虑疾病基因。

Prioritizing disease genes with an improved dual label propagation framework.

机构信息

College of Software, Nankai University, TianJin, 300350, China.

School of Computer Science and Information Engineering, Tianjin University of Science and Technology, TianJin, 300222, China.

出版信息

BMC Bioinformatics. 2018 Feb 8;19(1):47. doi: 10.1186/s12859-018-2040-6.

DOI:10.1186/s12859-018-2040-6

PMID:29422030

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5806269/

Abstract

BACKGROUND

Prioritizing disease genes is trying to identify potential disease causing genes for a given phenotype, which can be applied to reveal the inherited basis of human diseases and facilitate drug development. Our motivation is inspired by label propagation algorithm and the false positive protein-protein interactions that exist in the dataset. To the best of our knowledge, the false positive protein-protein interactions have not been considered before in disease gene prioritization. Label propagation has been successfully applied to prioritize disease causing genes in previous network-based methods. These network-based methods use basic label propagation, i.e. random walk, on networks to prioritize disease genes in different ways. However, all these methods can not deal with the situation in which plenty false positive protein-protein interactions exist in the dataset, because the PPI network is used as a fixed input in previous methods. This important characteristic of data source may cause a large deviation in results.

RESULTS

A novel network-based framework IDLP is proposed to prioritize candidate disease genes. IDLP effectively propagates labels throughout the PPI network and the phenotype similarity network. It avoids the method falling when few disease genes are known. Meanwhile, IDLP models the bias caused by false positive protein interactions and other potential factors by treating the PPI network matrix and the phenotype similarity matrix as the matrices to be learnt. By amending the noises in training matrices, it improves the performance results significantly. We conduct extensive experiments over OMIM datasets, and IDLP has demonstrated its effectiveness compared with eight state-of-the-art approaches. The robustness of IDLP is also validated by doing experiments with disturbed PPI network. Furthermore, We search the literatures to verify the predicted new genes got by IDLP are associated with the given diseases, the high prediction accuracy shows IDLP can be a powerful tool to help biologists discover new disease genes.

CONCLUSIONS

IDLP model is an effective method for disease gene prioritization, particularly for querying phenotypes without known associated genes, which would be greatly helpful for identifying disease genes for less studied phenotypes.

AVAILABILITY

https://github.com/nkiip/IDLP.

摘要

背景

优先考虑疾病基因是试图为特定表型识别潜在的致病基因，这可用于揭示人类疾病的遗传基础并促进药物开发。我们的动机受到标签传播算法和数据集中存在的假阳性蛋白质相互作用的启发。据我们所知，在疾病基因优先排序中，假阳性蛋白质相互作用尚未被考虑。标签传播已成功应用于先前基于网络的方法中，对致病基因进行优先级排序。这些基于网络的方法使用基本的标签传播，即随机游走，以不同的方式在网络上对疾病基因进行优先级排序。但是，所有这些方法都无法处理数据集中存在大量假阳性蛋白质相互作用的情况，因为在以前的方法中，PPI 网络被用作固定输入。这种数据源的重要特征可能会导致结果产生较大偏差。

结果

提出了一种新颖的基于网络的框架 IDLP 来对候选疾病基因进行优先级排序。IDLP 有效地在 PPI 网络和表型相似性网络中传播标签。它避免了在已知疾病基因较少的情况下方法失效的问题。同时，IDLP 通过将 PPI 网络矩阵和表型相似性矩阵视为要学习的矩阵，来对假阳性蛋白质相互作用和其他潜在因素引起的偏差进行建模。通过修正训练矩阵中的噪声，显著提高了性能结果。我们在 OMIM 数据集上进行了广泛的实验，IDLP 与八种最先进的方法相比证明了其有效性。通过对受干扰的 PPI 网络进行实验，验证了 IDLP 的稳健性。此外，我们通过搜索文献来验证 IDLP 预测的新基因与给定疾病之间的关联，高预测准确性表明 IDLP 可以成为帮助生物学家发现新疾病基因的有力工具。

结论

IDLP 模型是疾病基因优先排序的有效方法，特别是对于查询尚无已知相关基因的表型，这对于识别研究较少的表型的疾病基因将有很大帮助。

可用性

https://github.com/nkiip/IDLP。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f90/5806269/e13e95088d48/12859_2018_2040_Fig1_HTML.jpg

相似文献

Prioritizing disease genes with an improved dual label propagation framework.

BMC Bioinformatics. 2018 Feb 8;19(1):47. doi: 10.1186/s12859-018-2040-6.

Prioritization of candidate disease genes by enlarging the seed set and fusing information of the network topology and gene expression.

Mol Biosyst. 2014 Jun;10(6):1400-8. doi: 10.1039/c3mb70588a. Epub 2014 Apr 3.

Disease gene prioritization by integrating tissue-specific molecular networks using a robust multi-network model.

BMC Bioinformatics. 2016 Nov 10;17(1):453. doi: 10.1186/s12859-016-1317-x.

Prioritization of potential candidate disease genes by topological similarity of protein-protein interaction network and phenotype data.

J Biomed Inform. 2015 Feb;53:229-36. doi: 10.1016/j.jbi.2014.11.004. Epub 2014 Nov 15.

Gene gravity-like algorithm for disease gene prediction based on phenotype-specific network.

BMC Syst Biol. 2017 Dec 6;11(1):121. doi: 10.1186/s12918-017-0519-9.

Prioritization of candidate disease genes by combining topological similarity and semantic similarity.

J Biomed Inform. 2015 Oct;57:1-5. doi: 10.1016/j.jbi.2015.07.005. Epub 2015 Jul 11.

Constructing an integrated gene similarity network for the identification of disease genes.

J Biomed Semantics. 2017 Sep 20;8(Suppl 1):32. doi: 10.1186/s13326-017-0141-1.

Inferring gene-phenotype associations via global protein complex network propagation.

PLoS One. 2011;6(7):e21502. doi: 10.1371/journal.pone.0021502. Epub 2011 Jul 25.

NDRC: A Disease-Causing Genes Prioritized Method Based on Network Diffusion and Rank Concordance.

IEEE Trans Nanobioscience. 2015 Jul;14(5):521-7. doi: 10.1109/TNB.2015.2443852. Epub 2015 Jun 12.

HGPEC: a Cytoscape app for prediction of novel disease-gene and disease-disease associations and evidence collection based on a random walk on heterogeneous network.

BMC Syst Biol. 2017 Jun 15;11(1):61. doi: 10.1186/s12918-017-0437-x.

引用本文的文献

Prioritization of oligogenic variant combinations in whole exomes.

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae184.

Identification of Cancer Driver Genes by Integrating Multiomics Data with Graph Neural Networks.

Metabolites. 2023 Feb 24;13(3):339. doi: 10.3390/metabo13030339.

A network medicine approach for identifying diagnostic and prognostic biomarkers and exploring drug repurposing in human cancer.

Comput Struct Biotechnol J. 2022 Nov 29;21:34-45. doi: 10.1016/j.csbj.2022.11.037. eCollection 2023.

Network-Based Approaches for Disease-Gene Association Prediction Using Protein-Protein Interaction Networks.

Int J Mol Sci. 2022 Jul 3;23(13):7411. doi: 10.3390/ijms23137411.

Benchmarking network-based gene prioritization methods for cerebral small vessel disease.

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab006.

Predicting candidate genes from phenotypes, functions and anatomical site of expression.

Bioinformatics. 2021 May 5;37(6):853-860. doi: 10.1093/bioinformatics/btaa879.

A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases.

J Integr Bioinform. 2019 Sep 9;16(4):20180069. doi: 10.1515/jib-2018-0069.

本文引用的文献

Disease gene prioritization by integrating tissue-specific molecular networks using a robust multi-network model.

BMC Bioinformatics. 2016 Nov 10;17(1):453. doi: 10.1186/s12859-016-1317-x.

Transfer learning across ontologies for phenome-genome association prediction.

Bioinformatics. 2017 Feb 15;33(4):529-536. doi: 10.1093/bioinformatics/btw649.

Loss-of-function mutations in RAB39B are associated with typical early-onset Parkinson disease.

Neurol Genet. 2015 Jun 18;1(1):e9. doi: 10.1212/NXG.0000000000000009. eCollection 2015 Jun.

Drug-Target Interaction Prediction with Graph Regularized Matrix Factorization.

IEEE/ACM Trans Comput Biol Bioinform. 2017 May-Jun;14(3):646-656. doi: 10.1109/TCBB.2016.2530062. Epub 2016 Feb 15.

TRPM7 and its role in neurodegenerative diseases.

Channels (Austin). 2015;9(5):253-61. doi: 10.1080/19336950.2015.1075675. Epub 2015 Jul 28.

Phenome-driven disease genetics prediction toward drug discovery.

Bioinformatics. 2015 Jun 15;31(12):i276-83. doi: 10.1093/bioinformatics/btv245.

Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases.

Sci Rep. 2015 Jun 8;5:10888. doi: 10.1038/srep10888.

The BioGRID interaction database: 2015 update.

Nucleic Acids Res. 2015 Jan;43(Database issue):D470-8. doi: 10.1093/nar/gku1204. Epub 2014 Nov 26.

A novel DCTN1 mutation with late-onset parkinsonism and frontotemporal atrophy.

Mov Disord. 2014 Aug;29(9):1201-4. doi: 10.1002/mds.25833. Epub 2014 Feb 22.

DNAJC13 mutations in Parkinson disease.

Hum Mol Genet. 2014 Apr 1;23(7):1794-801. doi: 10.1093/hmg/ddt570. Epub 2013 Nov 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用改进的双重标签传播框架优先考虑疾病基因。

Prioritizing disease genes with an improved dual label propagation framework.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

AVAILABILITY

背景

结果

结论

可用性

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献