DiCleave：一种用于预测人类 Dicer 切割位点的深度学习模型。

DiCleave: a deep learning model for predicting human Dicer cleavage sites.

机构信息

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, 611-0011, Japan.

Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia.

出版信息

BMC Bioinformatics. 2024 Jan 9;25(1):13. doi: 10.1186/s12859-024-05638-4.

DOI:10.1186/s12859-024-05638-4

PMID:38195423

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10775615/

Abstract

BACKGROUND

MicroRNAs (miRNAs) are a class of non-coding RNAs that play a pivotal role as gene expression regulators. These miRNAs are typically approximately 20 to 25 nucleotides long. The maturation of miRNAs requires Dicer cleavage at specific sites within the precursor miRNAs (pre-miRNAs). Recent advances in machine learning-based approaches for cleavage site prediction, such as PHDcleav and LBSizeCleav, have been reported. ReCGBM, a gradient boosting-based model, demonstrates superior performance compared with existing methods. Nonetheless, ReCGBM operates solely as a binary classifier despite the presence of two cleavage sites in a typical pre-miRNA. Previous approaches have focused on utilizing only a fraction of the structural information in pre-miRNAs, often overlooking comprehensive secondary structure information. There is a compelling need for the development of a novel model to address these limitations.

RESULTS

In this study, we developed a deep learning model for predicting the presence of a Dicer cleavage site within a pre-miRNA segment. This model was enhanced by an autoencoder that learned the secondary structure embeddings of pre-miRNA. Benchmarking experiments demonstrated that the performance of our model was comparable to that of ReCGBM in the binary classification tasks. In addition, our model excelled in multi-class classification tasks, making it a more versatile and practical solution than ReCGBM.

CONCLUSIONS

Our proposed model exhibited superior performance compared with the current state-of-the-art model, underscoring the effectiveness of a deep learning approach in predicting Dicer cleavage sites. Furthermore, our model could be trained using only sequence and secondary structure information. Its capacity to accommodate multi-class classification tasks has enhanced the practical utility of our model.

摘要

背景

MicroRNAs（miRNAs）是一类非编码 RNA，作为基因表达调控因子发挥着关键作用。这些 miRNAs 通常约 20 到 25 个核苷酸长。miRNAs 的成熟需要在 miRNA 前体（pre-miRNA）的特定位置进行 Dicer 切割。最近已经报道了基于机器学习的切割位点预测方法的进展，例如 PHDcleav 和 LBSizeCleav。基于梯度提升的 ReCGBM 模型与现有方法相比表现出色。尽管典型的 pre-miRNA 中存在两个切割位点，但 ReCGBM 仍然仅作为二进制分类器运行。以前的方法主要集中在利用 pre-miRNA 中的部分结构信息，而经常忽略全面的二级结构信息。因此，迫切需要开发一种新的模型来解决这些限制。

结果

在这项研究中，我们开发了一种用于预测 pre-miRNA 片段中 Dicer 切割位点存在的深度学习模型。该模型通过自动编码器得到增强，自动编码器学习 pre-miRNA 的二级结构嵌入。基准测试实验表明，我们的模型在二进制分类任务中的性能与 ReCGBM 相当。此外，我们的模型在多类分类任务中表现出色，使其比 ReCGBM 更具通用性和实用性。