使用多种特征准确识别 RNA D 修饰。

Accurate identification of RNA D modification using multiple features.

机构信息

School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, Guangdong China.

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan China.

出版信息

RNA Biol. 2021 Dec;18(12):2236-2246. doi: 10.1080/15476286.2021.1898160. Epub 2021 Mar 17.

DOI:10.1080/15476286.2021.1898160

PMID:33729104

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8632091/

Abstract

As one of the common post-transcriptional modifications in tRNAs, dihydrouridine (D) has prominent effects on regulating the flexibility of tRNA as well as cancerous diseases. Facing with the expensive and time-consuming sequencing techniques to detect D modification, precise computational tools can largely promote the progress of molecular mechanisms and medical developments. We proposed a novel predictor, called iRNAD_XGBoost, to identify potential D sites using multiple RNA sequence representations. In this method, by considering the imbalance problem using hybrid sampling method SMOTEEEN, the XGBoost-selected top 30 features are applied to construct model. The optimized model showed high and values of 97.13% and 97.38% over jackknife test, respectively. For the independent experiment, these two metrics separately achieved 91.67% and 94.74%. Compared with iRNAD method, this model illustrated high generalizability and consistent prediction efficiencies for positive and negative samples, which yielded satisfactory scores of 0.94 and 0.86, respectively. It is inferred that the chemical property and nucleotide density features (CPND), electron-ion interaction pseudopotential (EIIP and PseEIIP) as well as dinucleotide composition (DNC) are crucial to the recognition of D modification. The proposed predictor is a promising tool to help experimental biologists investigate molecular functions.

摘要

作为 tRNA 中常见的转录后修饰之一，二氢尿嘧啶（D）对调节 tRNA 的柔韧性以及癌症等疾病具有显著影响。面对昂贵且耗时的测序技术来检测 D 修饰，精确的计算工具可以极大地促进分子机制和医学发展的进步。我们提出了一种名为 iRNAD_XGBoost 的新型预测器，该预测器使用多种 RNA 序列表示来识别潜在的 D 位点。在该方法中，通过使用混合采样方法 SMOTEEEN 考虑不平衡问题，将 XGBoost 选择的前 30 个特征应用于构建模型。优化后的模型在 jackknife 测试中分别具有 97.13%和 97.38%的高和值。对于独立实验，这两个指标分别达到了 91.67%和 94.74%。与 iRNAD 方法相比，该模型对正、负样本具有较高的通用性和一致的预测效率，其分数分别为 0.94 和 0.86。可以推断，化学性质和核苷酸密度特征（CPND）、电子-离子相互作用伪势（EIIP 和 PseEIIP）以及二核苷酸组成（DNC）对于 D 修饰的识别至关重要。该预测器是帮助实验生物学家研究分子功能的有前途的工具。

相似文献

Accurate identification of RNA D modification using multiple features.使用多种特征准确识别 RNA D 修饰。

RNA Biol. 2021 Dec;18(12):2236-2246. doi: 10.1080/15476286.2021.1898160. Epub 2021 Mar 17.

Identification of D Modification Sites by Integrating Heterogeneous Features in .通过整合异构特征鉴定. 中的 D 修饰位点

Molecules. 2019 Jan 22;24(3):380. doi: 10.3390/molecules24030380.

iRNAD: a computational tool for identifying D modification sites in RNA sequence.iRNAD：一种用于识别 RNA 序列中 D 修饰位点的计算工具。

Bioinformatics. 2019 Dec 1;35(23):4922-4929. doi: 10.1093/bioinformatics/btz358.

PseUI: Pseudouridine sites identification based on RNA sequence information.PseUI：基于 RNA 序列信息的假尿嘧啶核苷位点鉴定。

BMC Bioinformatics. 2018 Aug 29;19(1):306. doi: 10.1186/s12859-018-2321-0.

iRNA-m5U: A sequence based predictor for identifying 5-methyluridine modification sites in Saccharomyces cerevisiae.iRNA-m5U：一种基于序列的预测工具，用于鉴定酿酒酵母中的 5-甲基尿嘧啶修饰位点。

Methods. 2022 Jul;203:28-31. doi: 10.1016/j.ymeth.2021.04.013. Epub 2021 Apr 18.

Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties.基于核苷酸化学性质的随机森林模型鉴定 D 修饰位点。

Int J Mol Sci. 2022 Mar 11;23(6):3044. doi: 10.3390/ijms23063044.

TargetM6A: Identifying N-Methyladenosine Sites From RNA Sequences via Position-Specific Nucleotide Propensities and a Support Vector Machine.TargetM6A：通过位置特异性核苷酸倾向和支持向量机从RNA序列中识别N-甲基腺苷位点

IEEE Trans Nanobioscience. 2016 Oct;15(7):674-682. doi: 10.1109/TNB.2016.2599115. Epub 2016 Aug 10.

DHU-Pred: accurate prediction of dihydrouridine sites using position and composition variant features on diverse classifiers.DHU-Pred：使用多种分类器上的位置和组成变体特征准确预测二氢尿嘧啶位点。

PeerJ. 2022 Oct 27;10:e14104. doi: 10.7717/peerj.14104. eCollection 2022.

The specificities of four yeast dihydrouridine synthases for cytoplasmic tRNAs.四种酵母二氢尿苷合酶对细胞质转运RNA的特异性

J Biol Chem. 2004 Apr 23;279(17):17850-60. doi: 10.1074/jbc.M401221200. Epub 2004 Feb 16.

Modified Nucleotides and RNA Structure Prediction.修饰核苷酸和 RNA 结构预测。

Methods Mol Biol. 2024;2726:169-207. doi: 10.1007/978-1-0716-3519-3_8.

引用本文的文献

Biological Sequence Classification: A Review on Data and General Methods.生物序列分类：数据与通用方法综述

Research (Wash D C). 2022 Dec 19;2022:0011. doi: 10.34133/research.0011. eCollection 2022.

DPred_3S: identifying dihydrouridine (D) modification on three species epitranscriptome based on multiple sequence-derived features.DPred_3S：基于多序列衍生特征识别三种物种表观转录组中的二氢尿苷（D）修饰

Front Genet. 2023 Dec 15;14:1334132. doi: 10.3389/fgene.2023.1334132. eCollection 2023.

iDHU-Ensem: Identification of dihydrouridine sites through ensemble learning models.iDHU-Ensem：通过集成学习模型识别二氢尿苷位点。

Digit Health. 2023 Mar 29;9:20552076231165963. doi: 10.1177/20552076231165963. eCollection 2023 Jan-Dec.

Self-attention enabled deep learning of dihydrouridine (D) modification on mRNAs unveiled a distinct sequence signature from tRNAs.自我注意力机制助力的深度学习揭示了信使核糖核酸（mRNA）上二氢尿嘧啶（D）修饰的独特序列特征，该特征与转运核糖核酸（tRNA）不同。

Mol Ther Nucleic Acids. 2023 Jan 27;31:411-420. doi: 10.1016/j.omtn.2023.01.014. eCollection 2023 Mar 14.

PeerJ. 2022 Oct 27;10:e14104. doi: 10.7717/peerj.14104. eCollection 2022.

The Dihydrouridine landscape from tRNA to mRNA: a perspective on synthesis, structural impact and function.从 tRNA 到 mRNA 的二氢尿嘧啶景观：合成、结构影响和功能的视角。

RNA Biol. 2022 Jan;19(1):735-750. doi: 10.1080/15476286.2022.2078094.

Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties.基于核苷酸化学性质的随机森林模型鉴定 D 修饰位点。

Int J Mol Sci. 2022 Mar 11;23(6):3044. doi: 10.3390/ijms23063044.

本文引用的文献

IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning.IDP-Seq2Seq：基于序列到序列学习的无规卷曲区域鉴定。

Bioinformatics. 2021 Jan 29;36(21):5177-5186. doi: 10.1093/bioinformatics/btaa667.

CHTKC: a robust and efficient k-mer counting algorithm based on a lock-free chaining hash table.CHTKC：一种基于无锁链式哈希表的强大而高效的 k-mer 计数算法。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa063.

Computational identification of N6-methyladenosine sites in multiple tissues of mammals.哺乳动物多个组织中N6-甲基腺嘌呤位点的计算识别

Comput Struct Biotechnol J. 2020 Apr 30;18:1084-1091. doi: 10.1016/j.csbj.2020.04.015. eCollection 2020.

PASSION: an ensemble neural network approach for identifying the binding sites of RBPs on circRNAs.PASSION：一种用于识别 circRNAs 上 RBPs 结合位点的集成神经网络方法。

Bioinformatics. 2020 Aug 1;36(15):4276-4282. doi: 10.1093/bioinformatics/btaa522.

sgRNA-PSM: Predict sgRNAs On-Target Activity Based on Position-Specific Mismatch.sgRNA-PSM：基于位置特异性错配预测sgRNA的靶向活性。

Mol Ther Nucleic Acids. 2020 Jun 5;20:323-330. doi: 10.1016/j.omtn.2020.01.029. Epub 2020 Jan 31.

DeepAVP: A Dual-Channel Deep Neural Network for Identifying Variable-Length Antiviral Peptides.深 AV 肽：一种用于识别可变长度抗病毒肽的双通道深度神经网络。

IEEE J Biomed Health Inform. 2020 Oct;24(10):3012-3019. doi: 10.1109/JBHI.2020.2977091. Epub 2020 Feb 28.

ECFS-DEA: an ensemble classifier-based feature selection for differential expression analysis on expression profiles.ECFS-DEA：基于集成分类器的特征选择方法，用于表达谱上的差异表达分析。

BMC Bioinformatics. 2020 Feb 5;21(1):43. doi: 10.1186/s12859-020-3388-y.

Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families.鉴定酵母转录因子家族的高亲和力结合位点。

J Chem Inf Model. 2020 Mar 23;60(3):1876-1883. doi: 10.1021/acs.jcim.9b01012. Epub 2020 Jan 28.

Fold-LTR-TCP: protein fold recognition based on triadic closure principle.Fold-LTR-TCP：基于三元闭合原理的蛋白质折叠识别。

Brief Bioinform. 2020 Dec 1;21(6):2185-2193. doi: 10.1093/bib/bbz139.

MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks.MotifCNN-fold：基于基于模体的卷积神经网络提取的折叠特异特征的蛋白质折叠识别。

Brief Bioinform. 2020 Dec 1;21(6):2133-2141. doi: 10.1093/bib/bbz133.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验