Liu Xiao, Luo Yachuan, He Ting, Ren Meixiang, Xu Yuqiao
School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China.
School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China.
J Microbiol Methods. 2021 Sep;188:106297. doi: 10.1016/j.mimet.2021.106297. Epub 2021 Jul 31.
Essential genes are required for the reproduction and survival of an organism. Rapid identification of essential genes has practical application value in biomedicine. Information theory is a discipline that studies information transmission. Based on the similarity between heredity and information transmission, measures derived from information theory can be applied to genetic sequence analysis on different scales. In this study, we employed 114 features extracted by information theory methods to construct an essential gene prediction model. We applied a backpropagation neural network to construct a classifier and employed it to predict essential genes of 37 prokaryotes. The performance of the classifier was evaluated by applying intra-organism prediction and leave-one-species-out prediction. Among 37 prokaryotes, intra-organism prediction and leave-one-species-out prediction yielded average AUC scores of 0.791 and 0.717, respectively. Considering the potential redundancy in the feature set, we performed feature selection and constructed a key feature subset. In the above two prediction methods, the average AUC scores of 37 organisms obtained by using key features were 0.786 and 0.714, respectively. The results show the potential and universality of information-theoretic features in the study of prokaryotic essential gene prediction.
必需基因是生物体繁殖和生存所必需的。快速鉴定必需基因在生物医学中具有实际应用价值。信息论是一门研究信息传递的学科。基于遗传与信息传递之间的相似性,源自信息论的度量可应用于不同尺度的基因序列分析。在本研究中,我们采用通过信息论方法提取的114个特征构建了一个必需基因预测模型。我们应用反向传播神经网络构建分类器,并使用它来预测37种原核生物的必需基因。通过应用生物体内预测和留一物种法预测来评估分类器的性能。在37种原核生物中,生物体内预测和留一物种法预测的平均AUC得分分别为0.791和0.717。考虑到特征集中可能存在的冗余,我们进行了特征选择并构建了一个关键特征子集。在上述两种预测方法中,使用关键特征获得的37种生物体的平均AUC得分分别为0.786和0.714。结果表明信息论特征在原核生物必需基因预测研究中的潜力和普遍性。