Philips Institute for Oral Health Research, Virginia Commonwealth University, Richmond, Virginia, 23298, United States of America.
Application Services, Virginia Commonwealth University, Richmond, Virginia, United States of America.
Sci Rep. 2019 Sep 10;9(1):12949. doi: 10.1038/s41598-019-49098-w.
Experimental techniques for identification of essential genes (EGs) in prokaryotes are usually expensive, time-consuming and sometimes unrealistic. Emerging in silico methods provide alternative methods for EG prediction, but often possess limitations including heavy computational requirements and lack of biological explanation. Here we propose a new computational algorithm for EG prediction in prokaryotes with an online database (ePath) for quick access to the EG prediction results of over 4,000 prokaryotes ( https://www.pubapps.vcu.edu/epath/ ). In ePath, gene essentiality is linked to biological functions annotated by KEGG Ortholog (KO). Two new scoring systems, namely, E_score and P_score, are proposed for each KO as the EG evaluation criteria. E_score represents appearance and essentiality of a given KO in existing experimental results of gene essentiality, while P_score denotes gene essentiality based on the principle that a gene is essential if it plays a role in genetic information processing, cell envelope maintenance or energy production. The new EG prediction algorithm shows prediction accuracy ranging from 75% to 91% based on validation from five new experimental studies on EG identification. Our overall goal with ePath is to provide a comprehensive and reliable reference for gene essentiality annotation, facilitating the study of those prokaryotes without experimentally derived gene essentiality information.
用于鉴定原核生物必需基因 (EGs) 的实验技术通常昂贵、耗时,有时甚至不切实际。新兴的计算方法为 EG 预测提供了替代方法,但通常存在局限性,包括计算要求繁重和缺乏生物学解释。在这里,我们提出了一种新的用于预测原核生物 EG 的计算算法,并建立了一个在线数据库 (ePath),可快速访问超过 4000 个原核生物的 EG 预测结果 (https://www.pubapps.vcu.edu/epath/)。在 ePath 中,基因的必需性与通过 KEGG Ortholog (KO) 注释的生物学功能相关联。为每个 KO 提出了两个新的评分系统,即 E_score 和 P_score,作为 EG 的评估标准。E_score 表示给定 KO 在现有基因必需性实验结果中的出现和必需性,而 P_score 则表示基于基因在遗传信息处理、细胞包膜维持或能量产生中发挥作用的原则,如果一个基因是必需的,则该基因是必需的。新的 EG 预测算法在基于五项新的 EG 识别实验研究的验证中显示出 75%至 91%的预测准确性。我们使用 ePath 的总体目标是为基因必需性注释提供全面可靠的参考,促进对那些没有实验衍生基因必需性信息的原核生物的研究。