Seringhaus Michael, Paccanaro Alberto, Borneman Anthony, Snyder Michael, Gerstein Mark
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.
Genome Res. 2006 Sep;16(9):1126-35. doi: 10.1101/gr.5144106. Epub 2006 Aug 9.
Essential genes are required for an organism's viability, and the ability to identify these genes in pathogens is crucial to directed drug development. Predicting essential genes through computational methods is appealing because it circumvents expensive and difficult experimental screens. Most such prediction is based on homology mapping to experimentally verified essential genes in model organisms. We present here a different approach, one that relies exclusively on sequence features of a gene to estimate essentiality and offers a promising way to identify essential genes in unstudied or uncultured organisms. We identified 14 characteristic sequence features potentially associated with essentiality, such as localization signals, codon adaptation, GC content, and overall hydrophobicity. Using the well-characterized baker's yeast Saccharomyces cerevisiae, we employed a simple Bayesian framework to measure the correlation of each of these features with essentiality. We then employed the 14 features to learn the parameters of a machine learning classifier capable of predicting essential genes. We trained our classifier on known essential genes in S. cerevisiae and applied it to the closely related and relatively unstudied yeast Saccharomyces mikatae. We assessed predictive success in two ways: First, we compared all of our predictions with those generated by homology mapping between these two species. Second, we verified a subset of our predictions with eight in vivo knockouts in S. mikatae, and we present here the first experimentally confirmed essential genes in this species.
必需基因对于生物体的生存能力至关重要,而在病原体中识别这些基因的能力对于定向药物开发至关重要。通过计算方法预测必需基因很有吸引力,因为它规避了昂贵且困难的实验筛选。大多数此类预测是基于与模式生物中经实验验证的必需基因的同源性映射。我们在此提出一种不同的方法,该方法仅依赖基因的序列特征来估计必需性,并为识别未研究或未培养生物体中的必需基因提供了一种有前景的方式。我们确定了14个可能与必需性相关的特征序列,如定位信号、密码子适应性、GC含量和总体疏水性。利用特征明确的酿酒酵母,我们采用一个简单的贝叶斯框架来测量这些特征中的每一个与必需性的相关性。然后,我们利用这14个特征来学习一个能够预测必需基因的机器学习分类器的参数。我们在酿酒酵母中已知的必需基因上训练我们的分类器,并将其应用于密切相关且相对未研究的酵母米卡塔酵母。我们通过两种方式评估预测成功率:第一,我们将所有预测与这两个物种之间通过同源性映射生成的预测进行比较。第二,我们通过米卡塔酵母中的8个体内基因敲除来验证我们预测的一个子集,并且在此展示该物种中首个经实验确认的必需基因。