Intelligent Computing Lab, Institute of Intelligent Machine, Chinese Academy of Science, P.O. Box 1130, Hefei, Anhui 230031, China.
BMC Bioinformatics. 2010 Jun 24;11:343. doi: 10.1186/1471-2105-11-343.
Genetic interaction profiles are highly informative and helpful for understanding the functional linkages between genes, and therefore have been extensively exploited for annotating gene functions and dissecting specific pathway structures. However, our understanding is rather limited to the relationship between double concurrent perturbation and various higher level phenotypic changes, e.g. those in cells, tissues or organs. Modifier screens, such as synthetic genetic arrays (SGA) can help us to understand the phenotype caused by combined gene mutations. Unfortunately, exhaustive tests on all possible combined mutations in any genome are vulnerable to combinatorial explosion and are infeasible either technically or financially. Therefore, an accurate computational approach to predict genetic interaction is highly desirable, and such methods have the potential of alleviating the bottleneck on experiment design.
In this work, we introduce a computational systems biology approach for the accurate prediction of pairwise synthetic genetic interactions (SGI). First, a high-coverage and high-precision functional gene network (FGN) is constructed by integrating protein-protein interaction (PPI), protein complex and gene expression data; then, a graph-based semi-supervised learning (SSL) classifier is utilized to identify SGI, where the topological properties of protein pairs in weighted FGN is used as input features of the classifier. We compare the proposed SSL method with the state-of-the-art supervised classifier, the support vector machines (SVM), on a benchmark dataset in S. cerevisiae to validate our method's ability to distinguish synthetic genetic interactions from non-interaction gene pairs. Experimental results show that the proposed method can accurately predict genetic interactions in S. cerevisiae (with a sensitivity of 92% and specificity of 91%). Noticeably, the SSL method is more efficient than SVM, especially for very small training sets and large test sets.
We developed a graph-based SSL classifier for predicting the SGI. The classifier employs topological properties of weighted FGN as input features and simultaneously employs information induced from labelled and unlabelled data. Our analysis indicates that the topological properties of weighted FGN can be employed to accurately predict SGI. Also, the graph-based SSL method outperforms the traditional standard supervised approach, especially when used with small training sets. The proposed method can alleviate experimental burden of exhaustive test and provide a useful guide for the biologist in narrowing down the candidate gene pairs with SGI. The data and source code implementing the method are available from the website: http://home.ustc.edu.cn/~yzh33108/GeneticInterPred.htm.
遗传相互作用谱对于理解基因之间的功能联系非常有帮助,因此被广泛用于注释基因功能和剖析特定途径结构。然而,我们的理解仅限于双并发扰动与各种更高层次表型变化(例如细胞、组织或器官中的变化)之间的关系。修饰筛选,如合成遗传阵列(SGA),可以帮助我们理解由组合基因突变引起的表型。不幸的是,在任何基因组中对所有可能的组合突变进行详尽测试都容易受到组合爆炸的影响,无论在技术上还是在财务上都是不可行的。因此,准确预测遗传相互作用的计算方法是非常需要的,并且这种方法有可能缓解实验设计的瓶颈。
在这项工作中,我们引入了一种用于准确预测成对合成遗传相互作用(SGI)的计算系统生物学方法。首先,通过整合蛋白质-蛋白质相互作用(PPI)、蛋白质复合物和基因表达数据,构建一个高覆盖率和高精度的功能基因网络(FGN);然后,利用基于图的半监督学习(SSL)分类器来识别 SGI,其中蛋白质对在加权 FGN 中的拓扑性质被用作分类器的输入特征。我们将提出的 SSL 方法与最先进的监督分类器(支持向量机,SVM)在酿酒酵母的基准数据集上进行比较,以验证我们的方法区分合成遗传相互作用和非相互作用基因对的能力。实验结果表明,该方法可以准确预测酿酒酵母中的遗传相互作用(敏感性为 92%,特异性为 91%)。值得注意的是,SSL 方法比 SVM 更有效,特别是对于非常小的训练集和大型测试集。
我们开发了一种基于图的 SSL 分类器来预测 SGI。分类器采用加权 FGN 的拓扑性质作为输入特征,并同时利用来自标记和未标记数据的信息。我们的分析表明,加权 FGN 的拓扑性质可以用于准确预测 SGI。此外,基于图的 SSL 方法优于传统的标准监督方法,尤其是在使用小训练集时。该方法可以减轻详尽测试的实验负担,并为生物学家缩小具有 SGI 的候选基因对提供有用的指导。该方法的数据和源代码可从网站:http://home.ustc.edu.cn/~yzh33108/GeneticInterPred.htm 获得。