Alakuş Talha Burak
Department of Software Engineering, Faculty of Engineering, Kırklareli University, 39100 Kırklareli, Turkey.
Biomimetics (Basel). 2023 May 23;8(2):218. doi: 10.3390/biomimetics8020218.
Recent studies have shown that DNA enhancers have an important role in the regulation of gene expression. They are responsible for different important biological elements and processes such as development, homeostasis, and embryogenesis. However, experimental prediction of these DNA enhancers is time-consuming and costly as it requires laboratory work. Therefore, researchers started to look for alternative ways and started to apply computation-based deep learning algorithms to this field. Yet, the inconsistency and unsuccessful prediction performance of computational-based approaches among various cell lines led to the investigation of these approaches as well. Therefore, in this study, a novel DNA encoding scheme was proposed, and solutions were sought to the problems mentioned and DNA enhancers were predicted with BiLSTM. The study consisted of four different stages for two scenarios. In the first stage, DNA enhancer data were obtained. In the second stage, DNA sequences were converted to numerical representations by both the proposed encoding scheme and various DNA encoding schemes including EIIP, integer number, and atomic number. In the third stage, the BiLSTM model was designed, and the data were classified. In the final stage, the performance of DNA encoding schemes was determined by accuracy, precision, recall, F1-score, CSI, MCC, G-mean, Kappa coefficient, and AUC scores. In the first scenario, it was determined whether the DNA enhancers belonged to humans or mice. As a result of the prediction process, the highest performance was achieved with the proposed DNA encoding scheme, and an accuracy of 92.16% and an AUC score of 0.85 were calculated, respectively. The closest accuracy score to the proposed scheme was obtained with the EIIP DNA encoding scheme and the result was observed as 89.14%. The AUC score of this scheme was measured as 0.87. Among the remaining DNA encoding schemes, the atomic number showed an accuracy score of 86.61%, while this rate decreased to 76.96% with the integer scheme. The AUC values of these schemes were 0.84 and 0.82, respectively. In the second scenario, it was determined whether there was a DNA enhancer and, if so, it was decided to which species this enhancer belonged. In this scenario, the highest accuracy score was obtained with the proposed DNA encoding scheme and the result was 84.59%. Moreover, the AUC score of the proposed scheme was determined as 0.92. EIIP and integer DNA encoding schemes showed accuracy scores of 77.80% and 73.68%, respectively, while their AUC scores were close to 0.90. The most ineffective prediction was performed with the atomic number and the accuracy score of this scheme was calculated as 68.27%. Finally, the AUC score of this scheme was 0.81. At the end of the study, it was observed that the proposed DNA encoding scheme was successful and effective in predicting DNA enhancers.
最近的研究表明,DNA增强子在基因表达调控中起着重要作用。它们负责不同的重要生物学元件和过程,如发育、体内平衡和胚胎发生。然而,对这些DNA增强子进行实验预测既耗时又昂贵,因为这需要实验室工作。因此,研究人员开始寻找替代方法,并开始将基于计算的深度学习算法应用于该领域。然而,基于计算的方法在各种细胞系中的不一致性和预测性能不佳也导致了对这些方法的研究。因此,在本研究中,提出了一种新颖的DNA编码方案,并针对上述问题寻求解决方案,同时利用双向长短期记忆网络(BiLSTM)对DNA增强子进行预测。该研究针对两种情况包括四个不同阶段。在第一阶段,获取DNA增强子数据。在第二阶段,通过所提出的编码方案以及包括电子离子相互作用势(EIIP)、整数和原子序数在内的各种DNA编码方案,将DNA序列转换为数字表示。在第三阶段,设计BiLSTM模型并对数据进行分类。在最后阶段,通过准确率、精确率、召回率、F1分数、综合列联系数(CSI)、马修斯相关系数(MCC)、几何均值(G-mean)、卡帕系数和曲线下面积(AUC)分数来确定DNA编码方案的性能。在第一种情况下,确定DNA增强子是属于人类还是小鼠。预测过程的结果表明,所提出的DNA编码方案实现了最高性能,分别计算出准确率为92.16%和AUC分数为0.85。与所提出方案最接近的准确率分数是通过EIIP DNA编码方案获得的,结果为89.14%。该方案的AUC分数测量为0.87。在其余的DNA编码方案中,原子序数显示准确率分数为86.61%,而整数方案的这一比率降至76.96%。这些方案的AUC值分别为0.84和0.82。在第二种情况下,确定是否存在DNA增强子,如果存在,则确定该增强子属于哪个物种。在这种情况下,所提出的DNA编码方案获得了最高准确率分数,结果为84.59%。此外,所提出方案的AUC分数确定为0.92。EIIP和整数DNA编码方案的准确率分数分别为77.80%和73.68%,而它们的AUC分数接近0.90。原子序数的预测效果最差,该方案的准确率分数计算为68.27%。最后,该方案的AUC分数为0.81。在研究结束时,观察到所提出的DNA编码方案在预测DNA增强子方面是成功且有效的。