Lloyd John P, Seddon Alexander E, Moghe Gaurav D, Simenc Matthew C, Shiu Shin-Han
Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824.
Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, 48824.
Plant Cell. 2015 Aug;27(8):2133-47. doi: 10.1105/tpc.15.00051. Epub 2015 Aug 18.
Essential genes represent critical cellular components whose disruption results in lethality. Characteristics shared among essential genes have been uncovered in fungal and metazoan model systems. However, features associated with plant essential genes are largely unknown and the full set of essential genes remains to be discovered in any plant species. Here, we show that essential genes in Arabidopsis thaliana have distinct features useful for constructing within- and cross-species prediction models. Essential genes in A. thaliana are often single copy or derived from older duplications, highly and broadly expressed, slow evolving, and highly connected within molecular networks compared with genes with nonlethal mutant phenotypes. These gene features allowed the application of machine learning methods that predicted known lethal genes as well as an additional 1970 likely essential genes without documented phenotypes. Prediction models from A. thaliana could also be applied to predict Oryza sativa and Saccharomyces cerevisiae essential genes. Importantly, successful predictions drew upon many features, while any single feature was not sufficient. Our findings show that essential genes can be distinguished from genes with nonlethal phenotypes using features that are similar across kingdoms and indicate the possibility for translational application of our approach to species without extensive functional genomic and phenomic resources.
必需基因代表关键的细胞成分,其破坏会导致致死性。在真菌和后生动物模型系统中已经发现了必需基因共有的特征。然而,与植物必需基因相关的特征在很大程度上尚不清楚,并且任何植物物种中必需基因的完整集合仍有待发现。在这里,我们表明拟南芥中的必需基因具有独特的特征,可用于构建种内和跨物种预测模型。与具有非致死突变表型的基因相比,拟南芥中的必需基因通常是单拷贝的或源自较古老的重复事件,表达水平高且广泛,进化缓慢,并且在分子网络中高度连接。这些基因特征使得机器学习方法得以应用,这些方法预测了已知的致死基因以及另外1970个可能的必需基因,这些基因没有记录的表型。来自拟南芥的预测模型也可用于预测水稻和酿酒酵母的必需基因。重要的是,成功的预测利用了许多特征,而任何单个特征都不够。我们的研究结果表明,可以使用跨王国相似的特征将必需基因与具有非致死表型的基因区分开来,并表明我们的方法有可能应用于没有广泛功能基因组和表型资源的物种。