Suleman Muhammad Taseer, Alturise Fahad, Alkhalifah Tamim, Khan Yaser Daanial
Department of Computer Science, School of systems and technology, University of Management and Technology, Lahore, Pakistan.
Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia.
Digit Health. 2023 Mar 29;9:20552076231165963. doi: 10.1177/20552076231165963. eCollection 2023 Jan-Dec.
Dihydrouridine (D) is one of the most significant uridine modifications that have a prominent occurrence in eukaryotes. The folding and conformational flexibility of transfer RNA (tRNA) can be attained through this modification.
The modification also triggers lung cancer in humans. The identification of D sites was carried out through conventional laboratory methods; however, those were costly and time-consuming. The readiness of RNA sequences helps in the identification of D sites through computationally intelligent models. However, the most challenging part is turning these biological sequences into distinct vectors.
The current research proposed novel feature extraction mechanisms and the identification of D sites in tRNA sequences using ensemble models. The ensemble models were then subjected to evaluation using k-fold cross-validation and independent testing.
The results revealed that the stacking ensemble model outperformed all the ensemble models by revealing 0.98 accuracy, 0.98 specificity, 0.97 sensitivity, and 0.92 Matthews Correlation Coefficient. The proposed model, iDHU-Ensem, was also compared with pre-existing predictors using an independent test. The accuracy scores have shown that the proposed model in this research study performed better than the available predictors.
The current research contributed towards the enhancement of D site identification capabilities through computationally intelligent methods. A web-based server, iDHU-Ensem, was also made available for the researchers at https://taseersuleman-idhu-ensem-idhu-ensem.streamlit.app/.
二氢尿苷(D)是真核生物中最显著的尿苷修饰之一。通过这种修饰可实现转运RNA(tRNA)的折叠和构象灵活性。
这种修饰也会引发人类肺癌。通过传统实验室方法进行D位点的鉴定;然而,这些方法成本高且耗时。RNA序列的可得性有助于通过计算智能模型鉴定D位点。然而,最具挑战性的部分是将这些生物序列转化为独特的向量。
当前研究提出了新颖的特征提取机制,并使用集成模型鉴定tRNA序列中的D位点。然后使用k折交叉验证和独立测试对集成模型进行评估。
结果显示,堆叠集成模型表现优于所有集成模型,准确率为0.98、特异性为0.98、灵敏度为0.97、马修斯相关系数为0.92。还使用独立测试将所提出的模型iDHU-Ensem与先前存在的预测器进行了比较。准确率得分表明,本研究中提出的模型比现有预测器表现更好。
当前研究通过计算智能方法有助于提高D位点的识别能力。还为研究人员在https://taseersuleman-idhu-ensem-idhu-ensem.streamlit.app/提供了基于网络的服务器iDHU-Ensem。