Department of Computer Science and Engineering, University of California Riverside, 900 University Ave, Riverside, 92507, CA, USA.
BMC Bioinformatics. 2020 Sep 30;21(Suppl 14):367. doi: 10.1186/s12859-020-03688-y.
Essential genes are those genes that are critical for the survival of an organism. The prediction of essential genes in bacteria can provide targets for the design of novel antibiotic compounds or antimicrobial strategies.
We propose a deep neural network for predicting essential genes in microbes. Our architecture called DEEPLYESSENTIAL makes minimal assumptions about the input data (i.e., it only uses gene primary sequence and the corresponding protein sequence) to carry out the prediction thus maximizing its practical application compared to existing predictors that require structural or topological features which might not be readily available. We also expose and study a hidden performance bias that effected previous classifiers. Extensive results show that DEEPLYESSENTIAL outperform existing classifiers that either employ down-sampling to balance the training set or use clustering to exclude multiple copies of orthologous genes.
Deep neural network architectures can efficiently predict whether a microbial gene is essential (or not) using only its sequence information.
必需基因是指对生物体生存至关重要的基因。预测细菌中的必需基因可以为设计新型抗生素化合物或抗菌策略提供目标。
我们提出了一种用于预测微生物中必需基因的深度神经网络。我们的架构称为 DEEPLYESSENTIAL,对输入数据(即仅使用基因的一级序列和相应的蛋白质序列)做出了最小的假设,从而与需要结构或拓扑特征的现有预测器相比,最大限度地提高了其实用性,这些特征可能不容易获得。我们还揭示并研究了影响以前分类器的隐藏性能偏差。广泛的结果表明,DEEPLLYESSENTIAL 优于现有的分类器,这些分类器要么采用下采样来平衡训练集,要么使用聚类来排除同源基因的多个副本。
深度神经网络架构可以仅使用其序列信息有效地预测微生物基因是否必需(或非必需)。