School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China.
Genes Genomics. 2020 Jan;42(1):97-106. doi: 10.1007/s13258-019-00884-w. Epub 2019 Nov 17.
Rapid identification of new essential genes is necessary to understand biological mechanisms and identify potential targets for antimicrobial drugs. Many computational methods have been proposed.
To construct an essential genes classifier which satisfies more different organisms, and to study the redundancy of features used in the prediction of essential genes.
We designed a 57-12-1 artificial neural network model to predict the essential genes of 31 prokaryotic genomes. Four methods including self-predictions of each organism, the leave-one-genome-out method, predicting all by one organism, and self-predictions of all organisms were applied to assess the predictive performance. Additionally, the 57 features used in the artificial neural network model were analyzed by weighted principal component analysis to screen the key features strongly related to the essentiality of genes.
Our results compared with previous researches indicate that our models had better generalizability. Furthermore, this method reduced the features to 29 while maintaining stable prediction performance overall, suggesting that some features are redundant for gene essentiality, and the screened features contained more important biological information for gene essentiality.
This study showed the effectiveness and generalizability of our artificial neural network model. In addition, the screened features could be used as key features in computational analysis and biological experiments.
快速鉴定新的必需基因对于理解生物机制和确定抗菌药物的潜在靶点是必要的。已经提出了许多计算方法。
构建一个满足更多不同生物的必需基因分类器,并研究用于预测必需基因的特征的冗余性。
我们设计了一个 57-12-1 人工神经网络模型来预测 31 个原核生物基因组中的必需基因。采用了四种方法来评估预测性能,包括每个生物的自我预测、一种生物的留一基因组法、所有生物的一次预测以及所有生物的自我预测。此外,通过加权主成分分析对用于人工神经网络模型的 57 个特征进行了分析,以筛选与基因必需性密切相关的关键特征。
与以前的研究相比,我们的结果表明我们的模型具有更好的泛化能力。此外,该方法将特征减少到 29 个,同时保持整体稳定的预测性能,这表明某些特征对于基因必需性是冗余的,筛选出的特征包含了更多与基因必需性相关的重要生物学信息。
本研究表明了我们的人工神经网络模型的有效性和泛化能力。此外,筛选出的特征可以用作计算分析和生物实验的关键特征。