Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Av. Digonal 643, 08028 Barcelona, Spain.
Institut de Biomedicina (IBUB), Universitat de Barcelona, Av. Diagonal 643, 08028 Barcelona, Spain.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae332.
Around 50 years ago, molecular biology opened the path to understand changes in forms, adaptations, complexity, or the basis of human diseases through myriads of reports on gene birth, gene duplication, gene expression regulation, and splicing regulation, among other relevant mechanisms behind gene function. Here, with the advent of big data and artificial intelligence (AI), we focus on an elusive and intriguing mechanism of gene function regulation, RNA editing, in which a single nucleotide from an RNA molecule is changed, with a remarkable impact in the increase of the complexity of the transcriptome and proteome. We present a new generation approach to assess the functional conservation of the RNA-editing targeting mechanism using two AI learning algorithms, random forest (RF) and bidirectional long short-term memory (biLSTM) neural networks with an attention layer. These algorithms, combined with RNA-editing data coming from databases and variant calling from same-individual RNA and DNA-seq experiments from different species, allowed us to predict RNA-editing events using both primary sequence and secondary structure. Then, we devised a method for assessing conservation or divergence in the molecular mechanisms of editing completely in silico: the cross-testing analysis. This novel method not only helps to understand the conservation of the editing mechanism through evolution but could set the basis for achieving a better understanding of the adenosine-targeting mechanism in other fields.
大约 50 年前,分子生物学为理解形态变化、适应、复杂性或人类疾病的基础打开了大门,通过大量关于基因诞生、基因复制、基因表达调控和剪接调控等相关机制的报告。在这里,随着大数据和人工智能 (AI) 的出现,我们专注于基因功能调控的一个难以捉摸且有趣的机制,即 RNA 编辑,在这种机制中,RNA 分子中的单个核苷酸发生变化,对转录组和蛋白质组的复杂性增加有显著影响。我们提出了一种新的方法来评估 RNA 编辑靶向机制的功能保守性,使用两种 AI 学习算法,随机森林 (RF) 和具有注意力层的双向长短期记忆 (biLSTM) 神经网络。这些算法结合来自数据库的 RNA 编辑数据和来自不同物种的个体 RNA 和 DNA-seq 实验的变异调用,允许我们使用一级序列和二级结构预测 RNA 编辑事件。然后,我们设计了一种完全在计算机上评估编辑分子机制的保守或分歧的方法:交叉测试分析。这种新方法不仅有助于通过进化理解编辑机制的保守性,而且可以为更好地理解其他领域的腺苷靶向机制奠定基础。