Suppr超能文献

基于核苷酸化学性质的随机森林模型鉴定 D 修饰位点。

Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties.

机构信息

School of Computer Science and Technology, Xidian University, Xi'an 710071, China.

Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.

出版信息

Int J Mol Sci. 2022 Mar 11;23(6):3044. doi: 10.3390/ijms23063044.

Abstract

Dihydrouridine (D) is an abundant post-transcriptional modification present in transfer RNA from eukaryotes, bacteria, and archaea. D has contributed to treatments for cancerous diseases. Therefore, the precise detection of D modification sites can enable further understanding of its functional roles. Traditional experimental techniques to identify D are laborious and time-consuming. In addition, there are few computational tools for such analysis. In this study, we utilized eleven sequence-derived feature extraction methods and implemented five popular machine algorithms to identify an optimal model. During data preprocessing, data were partitioned for training and testing. Oversampling was also adopted to reduce the effect of the imbalance between positive and negative samples. The best-performing model was obtained through a combination of random forest and nucleotide chemical property modeling. The optimized model presented high sensitivity and specificity values of 0.9688 and 0.9706 in independent tests, respectively. Our proposed model surpassed published tools in independent tests. Furthermore, a series of validations across several aspects was conducted in order to demonstrate the robustness and reliability of our model.

摘要

二氢尿嘧啶 (D) 是真核生物、细菌和古菌转移 RNA 中丰富的转录后修饰。D 有助于治疗癌症疾病。因此,精确检测 D 修饰位点可以进一步了解其功能作用。传统的实验技术识别 D 是费力且耗时的。此外,用于此类分析的计算工具很少。在这项研究中,我们利用了十一种序列衍生的特征提取方法,并实现了五种流行的机器算法来识别最佳模型。在数据预处理过程中,数据被分为训练和测试。我们还采用了过采样来减少正负样本不平衡的影响。通过随机森林和核苷酸化学性质建模的组合,获得了表现最佳的模型。该优化模型在独立测试中分别表现出 0.9688 和 0.9706 的高灵敏度和特异性值。我们提出的模型在独立测试中超过了已发表的工具。此外,我们还进行了一系列多方面的验证,以证明我们模型的稳健性和可靠性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c48d/8950657/4a4f3b908c16/ijms-23-03044-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验