Žitnik Marinka, Zupan Blaž
1Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia.
2Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.
J Comput Biol. 2015 Jun;22(6):595-608. doi: 10.1089/cmb.2014.0158. Epub 2015 Feb 6.
Epistatic miniarray profile (E-MAP) is a popular large-scale genetic interaction discovery platform. E-MAPs benefit from quantitative output, which makes it possible to detect subtle interactions with greater precision. However, due to the limits of biotechnology, E-MAP studies fail to measure genetic interactions for up to 40% of gene pairs in an assay. Missing measurements can be recovered by computational techniques for data imputation, in this way completing the interaction profiles and enabling downstream analysis algorithms that could otherwise be sensitive to missing data values. We introduce a new interaction data imputation method called network-guided matrix completion (NG-MC). The core part of NG-MC is low-rank probabilistic matrix completion that incorporates prior knowledge presented as a collection of gene networks. NG-MC assumes that interactions are transitive, such that latent gene interaction profiles inferred by NG-MC depend on the profiles of their direct neighbors in gene networks. As the NG-MC inference algorithm progresses, it propagates latent interaction profiles through each of the networks and updates gene network weights toward improved prediction. In a study with four different E-MAP data assays and considered protein-protein interaction and gene ontology similarity networks, NG-MC significantly surpassed existing alternative techniques. Inclusion of information from gene networks also allowed NG-MC to predict interactions for genes that were not included in original E-MAP assays, a task that could not be considered by current imputation approaches.
上位性微阵列谱(E-MAP)是一个广受欢迎的大规模遗传相互作用发现平台。E-MAP受益于定量输出,这使得以更高的精度检测微妙的相互作用成为可能。然而,由于生物技术的局限性,E-MAP研究在一次检测中无法测量高达40%的基因对之间的遗传相互作用。缺失的测量值可以通过数据插补的计算技术来恢复,通过这种方式完成相互作用谱,并启用否则可能对缺失数据值敏感的下游分析算法。我们引入了一种新的相互作用数据插补方法,称为网络引导矩阵补全(NG-MC)。NG-MC的核心部分是低秩概率矩阵补全,它结合了作为基因网络集合呈现的先验知识。NG-MC假设相互作用是可传递的,因此NG-MC推断出的潜在基因相互作用谱取决于其在基因网络中直接邻居的谱。随着NG-MC推理算法的推进,它通过每个网络传播潜在相互作用谱,并朝着改进预测的方向更新基因网络权重。在一项使用四种不同E-MAP数据检测并考虑蛋白质-蛋白质相互作用和基因本体相似性网络的研究中,NG-MC显著超越了现有的替代技术。纳入来自基因网络的信息还使NG-MC能够预测原始E-MAP检测中未包含的基因之间的相互作用,这是当前插补方法无法完成的任务。