Bilal Anas, Alarfaj Fawaz Khaled, Khan Rafaqat Alam, Suleman Muhammad Taseer, Long Haixia
College of Information Science and Technology, Hainan Normal University, Haikou 571158, China.
Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou 571158, China.
Bioinformatics. 2022 Jan 1;41(1). doi: 10.1093/bioinformatics/btae722.
5-Methylcytosine (m5c), a modified cytosine base, arises from adding a methyl group at the 5th carbon position. This modification is a prevalent form of post-transcriptional modification (PTM) found in various types of RNA. Traditional laboratory techniques often fail to provide rapid and accurate identification of m5c sites. However, with the growing accessibility of sequence data, expanding computational models offers a more efficient and reliable approach to m5c site detection. This research focused on creating advanced in-silico methods using ensemble learning techniques. The encoded data was processed through ensemble models, including bagging and boosting techniques. These models were then rigorously evaluated through independent testing and 10-fold cross-validation.
Among the models tested, the Bagging ensemble-based predictor, m5C-iEnsem, demonstrated superior performance to existing m5c prediction tools.
To further support the research community, m5c-iEnsem has been made available via a user-friendly web server at https://m5c-iensem.streamlit.app/.
5-甲基胞嘧啶(m5c)是一种经过修饰的胞嘧啶碱基,通过在第5个碳位置添加一个甲基而产生。这种修饰是在各种类型的RNA中发现的一种普遍的转录后修饰(PTM)形式。传统的实验室技术常常无法快速准确地识别m5c位点。然而,随着序列数据的获取越来越容易,扩展计算模型为m5c位点检测提供了一种更高效、更可靠的方法。本研究专注于使用集成学习技术创建先进的计算机模拟方法。编码数据通过包括装袋和提升技术在内的集成模型进行处理。然后通过独立测试和10折交叉验证对这些模型进行严格评估。
在测试的模型中,基于装袋集成的预测器m5C-iEnsem表现出优于现有m5c预测工具的性能。
为了进一步支持研究界,m5c-iEnsem已通过一个用户友好的网络服务器在https://m5c-iensem.streamlit.app/上提供。