Institute of Systems Science, 25 Heng Mui Keng Terrace, National University of Singapore, Singapore 119615, Singapore.
Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, Singapore 639798, Singapore.
Cells. 2019 Jul 23;8(7):767. doi: 10.3390/cells8070767.
Enhancers are short deoxyribonucleic acid fragments that assume an important part in the genetic process of gene expression. Due to their possibly distant location relative to the gene that is acted upon, the identification of enhancers is difficult. There are many published works focused on identifying enhancers based on their sequence information, however, the resulting performance still requires improvements. Using deep learning methods, this study proposes a model ensemble of classifiers for predicting enhancers based on deep recurrent neural networks. The input features of deep ensemble networks were generated from six types of dinucleotide physicochemical properties, which had outperformed the other features. In summary, our model which used this ensemble approach could identify enhancers with achieved sensitivity of 75.5%, specificity of 76%, accuracy of 75.5%, and MCC of 0.51. For classifying enhancers into strong or weak sequences, our model reached sensitivity of 83.15%, specificity of 45.61%, accuracy of 68.49%, and MCC of 0.312. Compared to the benchmark result, our results had higher performance in term of most measurement metrics. The results showed that deep model ensembles hold the potential for improving on the best results achieved to date using shallow machine learning methods.
增强子是短的脱氧核糖核酸片段,在基因表达的遗传过程中起着重要作用。由于它们相对于作用的基因可能位于较远的位置,因此识别增强子是困难的。有许多已发表的工作专注于根据其序列信息识别增强子,但是,所得到的性能仍然需要改进。本研究使用深度学习方法,基于深度递归神经网络提出了一种分类器的模型集成,用于预测增强子。深度集成网络的输入特征是从六种二核苷酸物理化学特性中生成的,其性能优于其他特征。总的来说,我们使用这种集成方法的模型可以识别出敏感性为 75.5%、特异性为 76%、准确性为 75.5%和 MCC 为 0.51 的增强子。对于将增强子分类为强或弱序列,我们的模型达到了敏感性为 83.15%、特异性为 45.61%、准确性为 68.49%和 MCC 为 0.312 的结果。与基准结果相比,我们的结果在大多数度量指标上都具有更高的性能。结果表明,深度模型集成有可能提高使用浅层机器学习方法迄今取得的最佳结果。