基于核苷酸化学性质的随机森林模型鉴定 D 修饰位点。

Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties.

机构信息

School of Computer Science and Technology, Xidian University, Xi'an 710071, China.

Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.

出版信息

Int J Mol Sci. 2022 Mar 11;23(6):3044. doi: 10.3390/ijms23063044.

DOI:10.3390/ijms23063044

PMID:35328461

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8950657/

Abstract

Dihydrouridine (D) is an abundant post-transcriptional modification present in transfer RNA from eukaryotes, bacteria, and archaea. D has contributed to treatments for cancerous diseases. Therefore, the precise detection of D modification sites can enable further understanding of its functional roles. Traditional experimental techniques to identify D are laborious and time-consuming. In addition, there are few computational tools for such analysis. In this study, we utilized eleven sequence-derived feature extraction methods and implemented five popular machine algorithms to identify an optimal model. During data preprocessing, data were partitioned for training and testing. Oversampling was also adopted to reduce the effect of the imbalance between positive and negative samples. The best-performing model was obtained through a combination of random forest and nucleotide chemical property modeling. The optimized model presented high sensitivity and specificity values of 0.9688 and 0.9706 in independent tests, respectively. Our proposed model surpassed published tools in independent tests. Furthermore, a series of validations across several aspects was conducted in order to demonstrate the robustness and reliability of our model.

摘要

二氢尿嘧啶 (D) 是真核生物、细菌和古菌转移 RNA 中丰富的转录后修饰。D 有助于治疗癌症疾病。因此，精确检测 D 修饰位点可以进一步了解其功能作用。传统的实验技术识别 D 是费力且耗时的。此外，用于此类分析的计算工具很少。在这项研究中，我们利用了十一种序列衍生的特征提取方法，并实现了五种流行的机器算法来识别最佳模型。在数据预处理过程中，数据被分为训练和测试。我们还采用了过采样来减少正负样本不平衡的影响。通过随机森林和核苷酸化学性质建模的组合，获得了表现最佳的模型。该优化模型在独立测试中分别表现出 0.9688 和 0.9706 的高灵敏度和特异性值。我们提出的模型在独立测试中超过了已发表的工具。此外，我们还进行了一系列多方面的验证，以证明我们模型的稳健性和可靠性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c48d/8950657/4a4f3b908c16/ijms-23-03044-g001.jpg

相似文献

Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties.基于核苷酸化学性质的随机森林模型鉴定 D 修饰位点。

Int J Mol Sci. 2022 Mar 11;23(6):3044. doi: 10.3390/ijms23063044.

iRNAD: a computational tool for identifying D modification sites in RNA sequence.iRNAD：一种用于识别 RNA 序列中 D 修饰位点的计算工具。

Bioinformatics. 2019 Dec 1;35(23):4922-4929. doi: 10.1093/bioinformatics/btz358.

DHU-Pred: accurate prediction of dihydrouridine sites using position and composition variant features on diverse classifiers.DHU-Pred：使用多种分类器上的位置和组成变体特征准确预测二氢尿嘧啶位点。

PeerJ. 2022 Oct 27;10:e14104. doi: 10.7717/peerj.14104. eCollection 2022.

Accurate identification of RNA D modification using multiple features.使用多种特征准确识别 RNA D 修饰。

RNA Biol. 2021 Dec;18(12):2236-2246. doi: 10.1080/15476286.2021.1898160. Epub 2021 Mar 17.

Identification of D Modification Sites by Integrating Heterogeneous Features in .通过整合异构特征鉴定. 中的 D 修饰位点

Molecules. 2019 Jan 22;24(3):380. doi: 10.3390/molecules24030380.

RFhy-m2G: Identification of RNA N2-methylguanosine modification sites based on random forest and hybrid features.基于随机森林和混合特征的 RNA N2-甲基鸟苷修饰位点鉴定。

Methods. 2022 Jul;203:32-39. doi: 10.1016/j.ymeth.2021.05.016. Epub 2021 May 24.

DGA-5mC: A 5-methylcytosine site prediction model based on an improved DenseNet and bidirectional GRU method.DGA-5mC：一种基于改进的 DenseNet 和双向 GRU 方法的 5- 甲基胞嘧啶位点预测模型。

Math Biosci Eng. 2023 Mar 24;20(6):9759-9780. doi: 10.3934/mbe.2023428.

Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences.全面综述和评估基于 RNA 序列预测 RNA 转录后修饰位点的计算方法。

Brief Bioinform. 2020 Sep 25;21(5):1676-1696. doi: 10.1093/bib/bbz112.

Inspector: a lysine succinylation predictor based on edited nearest-neighbor undersampling and adaptive synthetic oversampling.Inspector：一种基于编辑最近邻欠采样和自适应合成过采样的赖氨酸琥珀酰化预测器。

Anal Biochem. 2020 Mar 15;593:113592. doi: 10.1016/j.ab.2020.113592. Epub 2020 Jan 20.

NmRF: identification of multispecies RNA 2'-O-methylation modification sites from RNA sequences.NmRF：从 RNA 序列中鉴定多物种 RNA 2'-O-甲基化修饰位点。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab480.

引用本文的文献

Long extrachromosomal circular DNA identification by fusing sequence-derived features of physicochemical properties and nucleotide distribution patterns.通过融合物理化学性质和核苷酸分布模式的序列衍生特征来鉴定长链染色体外环状DNA

Sci Rep. 2024 Apr 24;14(1):9466. doi: 10.1038/s41598-024-57457-5.

Computational identification of promoters in by using support vector machine.利用支持向量机对[具体对象]中的启动子进行计算识别。（原文中“in by using”表述不完整，推测应该是“in [具体对象] by using” ，这里按照推测后的完整意思翻译）

Front Microbiol. 2023 May 5;14:1200678. doi: 10.3389/fmicb.2023.1200678. eCollection 2023.

iDHU-Ensem: Identification of dihydrouridine sites through ensemble learning models.iDHU-Ensem：通过集成学习模型识别二氢尿苷位点。

Digit Health. 2023 Mar 29;9:20552076231165963. doi: 10.1177/20552076231165963. eCollection 2023 Jan-Dec.

Self-attention enabled deep learning of dihydrouridine (D) modification on mRNAs unveiled a distinct sequence signature from tRNAs.自我注意力机制助力的深度学习揭示了信使核糖核酸（mRNA）上二氢尿嘧啶（D）修饰的独特序列特征，该特征与转运核糖核酸（tRNA）不同。

Mol Ther Nucleic Acids. 2023 Jan 27;31:411-420. doi: 10.1016/j.omtn.2023.01.014. eCollection 2023 Mar 14.

本文引用的文献

Accurate identification of RNA D modification using multiple features.使用多种特征准确识别 RNA D 修饰。

RNA Biol. 2021 Dec;18(12):2236-2246. doi: 10.1080/15476286.2021.1898160. Epub 2021 Mar 17.

DNN-m6A: A Cross-Species Method for Identifying RNA N6-Methyladenosine Sites Based on Deep Neural Network with Multi-Information Fusion.基于多信息融合的深度神经网络的跨物种 RNA N6-甲基腺苷位点识别方法 DNN-m6A

Genes (Basel). 2021 Feb 28;12(3):354. doi: 10.3390/genes12030354.

Computational identification of N6-methyladenosine sites in multiple tissues of mammals.哺乳动物多个组织中N6-甲基腺嘌呤位点的计算识别

Comput Struct Biotechnol J. 2020 Apr 30;18:1084-1091. doi: 10.1016/j.csbj.2020.04.015. eCollection 2020.

RNAm5CPred: Prediction of RNA 5-Methylcytosine Sites Based on Three Different Kinds of Nucleotide Composition.RNAm5CPred：基于三种不同核苷酸组成的RNA 5-甲基胞嘧啶位点预测

Mol Ther Nucleic Acids. 2019 Dec 6;18:739-747. doi: 10.1016/j.omtn.2019.10.008. Epub 2019 Oct 18.

Brief Bioinform. 2020 Sep 25;21(5):1676-1696. doi: 10.1093/bib/bbz112.

4mCpred-EL: An Ensemble Learning Framework for Identification of DNA -methylcytosine Sites in the Mouse Genome.4mCpred-EL：用于鉴定小鼠基因组中 DNA-甲基胞嘧啶位点的集成学习框架。

Cells. 2019 Oct 28;8(11):1332. doi: 10.3390/cells8111332.

A comparison and assessment of computational method for identifying recombination hotspots in Saccharomyces cerevisiae.一种比较和评估鉴定酿酒酵母重组热点的计算方法。

Brief Bioinform. 2020 Sep 25;21(5):1568-1580. doi: 10.1093/bib/bbz123.

Evaluation of different computational methods on 5-methylcytosine sites identification.不同计算方法在 5-甲基胞嘧啶位点识别中的评估。

Brief Bioinform. 2020 May 21;21(3):982-995. doi: 10.1093/bib/bbz048.

iRNAD: a computational tool for identifying D modification sites in RNA sequence.iRNAD：一种用于识别 RNA 序列中 D 修饰位点的计算工具。

Bioinformatics. 2019 Dec 1;35(23):4922-4929. doi: 10.1093/bioinformatics/btz358.

iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data.iLearn：一个集成平台和元学习者，用于 DNA、RNA 和蛋白质序列数据的特征工程、机器学习分析和建模。

Brief Bioinform. 2020 May 21;21(3):1047-1057. doi: 10.1093/bib/bbz041.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于核苷酸化学性质的随机森林模型鉴定 D 修饰位点。

Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献