Suppr超能文献

miRe2e:一种基于转换器的端到端深度模型,用于预测前 miRNA。

miRe2e: a full end-to-end deep model based on transformers for prediction of pre-miRNAs.

机构信息

Informatics Department, Research Institute for Signals, Systems and Computational Intelligence sinc(i) (FICH-UNL/CONICET), Ciudad Universitaria, Santa Fe, Argentina.

出版信息

Bioinformatics. 2022 Feb 7;38(5):1191-1197. doi: 10.1093/bioinformatics/btab823.

Abstract

MOTIVATION

MicroRNAs (miRNAs) are small RNA sequences with key roles in the regulation of gene expression at post-transcriptional level in different species. Accurate prediction of novel miRNAs is needed due to their importance in many biological processes and their associations with complicated diseases in humans. Many machine learning approaches were proposed in the last decade for this purpose, but requiring handcrafted features extraction to identify possible de novo miRNAs. More recently, the emergence of deep learning (DL) has allowed the automatic feature extraction, learning relevant representations by themselves. However, the state-of-art deep models require complex pre-processing of the input sequences and prediction of their secondary structure to reach an acceptable performance.

RESULTS

In this work, we present miRe2e, the first full end-to-end DL model for pre-miRNA prediction. This model is based on Transformers, a neural architecture that uses attention mechanisms to infer global dependencies between inputs and outputs. It is capable of receiving the raw genome-wide data as input, without any pre-processing nor feature engineering. After a training stage with known pre-miRNAs, hairpin and non-harpin sequences, it can identify all the pre-miRNA sequences within a genome. The model has been validated through several experimental setups using the human genome, and it was compared with state-of-the-art algorithms obtaining 10 times better performance.

AVAILABILITY AND IMPLEMENTATION

Webdemo available at https://sinc.unl.edu.ar/web-demo/miRe2e/ and source code available for download at https://github.com/sinc-lab/miRe2e.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

微小 RNA(miRNAs)是在不同物种中转录后水平调控基因表达的关键小 RNA 序列。由于它们在许多生物过程中的重要性及其与人类复杂疾病的关联,需要准确预测新的 miRNAs。在过去的十年中,为此目的提出了许多机器学习方法,但需要手工提取特征以识别可能的从头 miRNAs。最近,深度学习(DL)的出现允许自动提取特征,通过自身学习相关表示。然而,最先进的深度学习模型需要对输入序列进行复杂的预处理,并预测其二级结构,才能达到可接受的性能。

结果

在这项工作中,我们提出了 miRe2e,这是第一个用于前体 miRNA 预测的全端到端深度学习模型。该模型基于 Transformer,这是一种使用注意力机制推断输入和输出之间全局依赖关系的神经架构。它能够接收原始的全基因组数据作为输入,无需任何预处理或特征工程。在经过已知前体 miRNA、发夹和非发夹序列的训练阶段后,它可以识别基因组中的所有前体 miRNA 序列。该模型已经通过使用人类基因组的多个实验设置进行了验证,并与最先进的算法进行了比较,性能提高了 10 倍。

可用性和实现

可在 https://sinc.unl.edu.ar/web-demo/miRe2e/ 上访问 Web 演示,可在 https://github.com/sinc-lab/miRe2e/ 上下载源代码。

补充信息

补充数据可在生物信息学在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验