College of Big Data and Intelligent Engineering, Southwest Forestry University, Kunming, China.
College of Material Science and Engineering, Southwest Forestry University, Kunming, China.
Comput Math Methods Med. 2022 Jul 25;2022:4490154. doi: 10.1155/2022/4490154. eCollection 2022.
MicroRNAs (miRNAs) are a kind of noncoding RNA, which plays an essential role in gene regulation by binding to messenger RNAs (mRNAs). Accurate and rapid identification of miRNA target genes is helpful to reveal the mechanism of transcriptome regulation, which is of great significance for the study of cancer and other diseases. Many bioinformatics methods have been proposed to solve this problem, but the previous research did not further study the encoding of the nucleotide sequence. In this paper, we developed a novel method combining word embedding and deep learning for human miRNA targets at the site-level prediction, which is inspired by the similarity between natural language and biological sequences. First, the word2vec model was used to mine the distribution representation of miRNAs and mRNAs. Then, the embedding is extracted automatically via the stacked bidirectional long short-term memory (BiLSTM) network. By testing, our method can effectively improve the accuracy, sensitivity, specificity, and -measure of other methods. Through our research, it is proved that the distributed representation can improve the accuracy of the deep learning model and better solve the miRNA target site prediction problem.
微小 RNA(miRNAs)是一种非编码 RNA,通过与信使 RNA(mRNA)结合在基因调控中发挥重要作用。准确快速地识别 miRNA 靶基因有助于揭示转录组调控的机制,这对于癌症等疾病的研究具有重要意义。已经提出了许多生物信息学方法来解决这个问题,但以前的研究并没有进一步研究核苷酸序列的编码。在本文中,我们开发了一种新的方法,将词嵌入和深度学习相结合,用于人类 miRNA 靶点的位点级预测,这是受到自然语言和生物序列之间相似性的启发。首先,使用 word2vec 模型挖掘 miRNA 和 mRNA 的分布表示。然后,通过堆叠双向长短期记忆(BiLSTM)网络自动提取嵌入。通过测试,我们的方法可以有效地提高其他方法的准确性、敏感性、特异性和 -measure。通过我们的研究,证明了分布式表示可以提高深度学习模型的准确性,并更好地解决 miRNA 靶位点预测问题。