基于归纳的矩阵补全算法预测基因-疾病关联

Inductive matrix completion for predicting gene-disease associations.

机构信息

Department of Computer Science, University of Texas at Austin, Austin, TX 78712, USA.

出版信息

Bioinformatics. 2014 Jun 15;30(12):i60-68. doi: 10.1093/bioinformatics/btu269.

DOI:10.1093/bioinformatics/btu269

PMID:24932006

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4058925/

Abstract

MOTIVATION

Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies-for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive.

RESULTS

Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better-it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has <15% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e. genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature.

AVAILABILITY

Source code and datasets can be downloaded from http://bigdata.ices.utexas.edu/project/gene-disease.

摘要

动机

现有的大多数预测因果疾病基因的方法都依赖于特定类型的证据，因此在适用性方面存在局限性。通常情况下，可用于疾病的证据类型各不相同，例如，我们可能知道关联基因、通过挖掘文本获得的与疾病相关的关键字，或者患者疾病症状的共同出现。同样，可用于基因的证据类型也各不相同，例如，特定的微阵列探针仅传达特定基因集的信息。在本文中，我们将一种称为归纳矩阵补全的新型矩阵补全方法应用于预测基因-疾病关联的问题中；它结合了疾病和基因的多种类型的证据（特征），以学习解释观察到的基因-疾病关联的潜在因素。我们从不同的生物来源构建特征，例如微阵列表达数据和与疾病相关的文本数据。该方法的一个关键优势是它是归纳的；与传统的矩阵补全方法和基于网络的推理方法不同，它可以应用于训练时未见过的疾病，而不是传输的方法。

结果

与在线孟德尔遗传数据库（OMIM）中的疾病的最新方法相比，所提出的方法要好得多-与最近提出的 Catapult 方法（第二好）相比，它有近四分之一的机会在前 100 次预测中恢复真实关联，该方法的机会<15%。我们证明，对于以前没有已知基因关联的查询疾病，以及预测新基因，即以前与疾病没有关联的基因，归纳方法特别有效。我们还通过在最近报道的 OMIM 关联和最近在文献中报道的关联上评估该方法，验证了预测的新颖性。

可用性

可以从 http://bigdata.ices.utexas.edu/project/gene-disease 下载源代码和数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7200/4058925/7d4ac5235d2b/btu269f1.jpg

相似文献

Inductive matrix completion for predicting gene-disease associations.

Bioinformatics. 2014 Jun 15;30(12):i60-68. doi: 10.1093/bioinformatics/btu269.

Deep Collaborative Filtering for Prediction of Disease Genes.

IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1639-1647. doi: 10.1109/TCBB.2019.2907536. Epub 2019 Mar 26.

Inferring disease and gene set associations with rank coherence in networks.

Bioinformatics. 2011 Oct 1;27(19):2692-9. doi: 10.1093/bioinformatics/btr463. Epub 2011 Aug 8.

Stable solution to l -based robust inductive matrix completion and its application in linking long noncoding RNAs to human diseases.

BMC Med Genomics. 2017 Dec 28;10(Suppl 5):77. doi: 10.1186/s12920-017-0310-1.

Robust Inductive Matrix Completion Strategy to Explore Associations Between LincRNAs and Human Disease Phenotypes.

IEEE/ACM Trans Comput Biol Bioinform. 2019 Nov-Dec;16(6):2066-2077. doi: 10.1109/TCBB.2018.2844816. Epub 2018 Jun 7.

Predicting human microbe-disease associations via graph attention networks with inductive matrix completion.

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa146.

Disease gene prioritization by integrating tissue-specific molecular networks using a robust multi-network model.

BMC Bioinformatics. 2016 Nov 10;17(1):453. doi: 10.1186/s12859-016-1317-x.

DRIMC: an improved drug repositioning approach using Bayesian inductive matrix completion.

Bioinformatics. 2020 May 1;36(9):2839-2847. doi: 10.1093/bioinformatics/btaa062.

Neural inductive matrix completion with graph convolutional networks for miRNA-disease association prediction.

Bioinformatics. 2020 Apr 15;36(8):2538-2546. doi: 10.1093/bioinformatics/btz965.

A knowledge-based approach for predicting gene-disease associations.

Bioinformatics. 2016 Sep 15;32(18):2831-8. doi: 10.1093/bioinformatics/btw358. Epub 2016 Jun 9.

引用本文的文献

A Deep Differential Analysis in Four Subtypes of Breast Cancer Based on Regulations of miRNA-mRNA.

IET Syst Biol. 2025 Jan-Dec;19(1):e70020. doi: 10.1049/syb2.70020.

Prediction of miRNA-disease association based on multisource inductive matrix completion.

Sci Rep. 2024 Nov 11;14(1):27503. doi: 10.1038/s41598-024-78212-w.

MGACL: Prediction Drug-Protein Interaction Based on Meta-Graph Association-Aware Contrastive Learning.

Biomolecules. 2024 Oct 8;14(10):1267. doi: 10.3390/biom14101267.

Heterogeneous biomedical entity representation learning for gene-disease association prediction.

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae380.

Current and future directions in network biology.

Bioinform Adv. 2024 Aug 14;4(1):vbae099. doi: 10.1093/bioadv/vbae099. eCollection 2024.

NGCN: Drug-target interaction prediction by integrating information and feature learning from heterogeneous network.

J Cell Mol Med. 2024 Apr;28(7):e18224. doi: 10.1111/jcmm.18224.

Toward Unified AI Drug Discovery with Multimodal Knowledge.

Health Data Sci. 2024 Feb 23;4:0113. doi: 10.34133/hds.0113. eCollection 2024.

Predicting drug-protein interactions by preserving the graph information of multi source data.

BMC Bioinformatics. 2024 Jan 4;25(1):10. doi: 10.1186/s12859-023-05620-6.

ReGeNNe: genetic pathway-based deep neural network using canonical correlation regularizer for disease prediction.

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad679.

Drug-target interaction prediction based on spatial consistency constraint and graph convolutional autoencoder.

BMC Bioinformatics. 2023 Apr 17;24(1):151. doi: 10.1186/s12859-023-05275-3.

本文引用的文献

Multitask learning for host-pathogen protein interactions.

Bioinformatics. 2013 Jul 1;29(13):i217-26. doi: 10.1093/bioinformatics/btt245.

Prediction and validation of gene-disease associations using methods inspired by social network analyses.

PLoS One. 2013 May 1;8(5):e58977. doi: 10.1371/journal.pone.0058977. Print 2013.

An unbiased evaluation of gene prioritization tools.

Bioinformatics. 2012 Dec 1;28(23):3081-8. doi: 10.1093/bioinformatics/bts581. Epub 2012 Oct 9.

Computational tools for prioritizing candidate genes: boosting disease gene discovery.

Nat Rev Genet. 2012 Jul 3;13(8):523-36. doi: 10.1038/nrg3253.

Computational approaches to disease-gene prediction: rationale, classification and successes.

FEBS J. 2012 Mar;279(5):678-96. doi: 10.1111/j.1742-4658.2012.08471.x. Epub 2012 Jan 30.

ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples.

BMC Bioinformatics. 2011 Oct 6;12:389. doi: 10.1186/1471-2105-12-389.

Prioritizing candidate disease genes by network-based boosting of genome-wide association data.

Genome Res. 2011 Jul;21(7):1109-21. doi: 10.1101/gr.118992.110. Epub 2011 May 2.

A high-resolution C. elegans essential gene network based on phenotypic profiling of a complex tissue.

Cell. 2011 Apr 29;145(3):470-82. doi: 10.1016/j.cell.2011.03.037.

Phenotypic landscape of a bacterial cell.

Cell. 2011 Jan 7;144(1):143-56. doi: 10.1016/j.cell.2010.11.052. Epub 2010 Dec 23.

Network medicine: a network-based approach to human disease.

Nat Rev Genet. 2011 Jan;12(1):56-68. doi: 10.1038/nrg2918.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于归纳的矩阵补全算法预测基因-疾病关联

Inductive matrix completion for predicting gene-disease associations.

机构信息

Department of Computer Science, University of Texas at Austin, Austin, TX 78712, USA.