Cha Mingyu, Zheng Hansi, Talukder Amlan, Barham Clayton, Li Xiaoman, Hu Haiyan
Department of Computer Science, University of Central Florida, Orlando, FL, USA.
Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL, USA.
Sci Rep. 2021 Mar 11;11(1):5625. doi: 10.1038/s41598-021-85173-x.
MicroRNAs (miRNAs) play important roles in post-transcriptional gene regulation and phenotype development. Understanding the regulation of miRNA genes is critical to understand gene regulation. One of the challenges to study miRNA gene regulation is the lack of condition-specific annotation of miRNA transcription start sites (TSSs). Unlike protein-coding genes, miRNA TSSs can be tens of thousands of nucleotides away from the precursor miRNAs and they are hard to be detected by conventional RNA-Seq experiments. A number of studies have been attempted to computationally predict miRNA TSSs. However, high-resolution condition-specific miRNA TSS prediction remains a challenging problem. Recently, deep learning models have been successfully applied to various bioinformatics problems but have not been effectively created for condition-specific miRNA TSS prediction. Here we created a two-stream deep learning model called D-miRT for computational prediction of condition-specific miRNA TSSs ( http://hulab.ucf.edu/research/projects/DmiRT/ ). D-miRT is a natural fit for the integration of low-resolution epigenetic features (DNase-Seq and histone modification data) and high-resolution sequence features. Compared with alternative computational models on different sets of training data, D-miRT outperformed all baseline models and demonstrated high accuracy for condition-specific miRNA TSS prediction tasks. Comparing with the most recent approaches on cell-specific miRNA TSS identification using cell lines that were unseen to the model training processes, D-miRT also showed superior performance.
微小RNA(miRNAs)在转录后基因调控和表型发育中发挥着重要作用。了解miRNA基因的调控对于理解基因调控至关重要。研究miRNA基因调控面临的挑战之一是缺乏miRNA转录起始位点(TSSs)的条件特异性注释。与蛋白质编码基因不同,miRNA的TSSs可能距离前体miRNA数万个核苷酸,并且很难通过传统的RNA测序实验检测到。许多研究尝试通过计算预测miRNA的TSSs。然而,高分辨率的条件特异性miRNA TSS预测仍然是一个具有挑战性的问题。最近,深度学习模型已成功应用于各种生物信息学问题,但尚未有效地用于条件特异性miRNA TSS预测。在此,我们创建了一种名为D-miRT的双流深度学习模型,用于对条件特异性miRNA TSSs进行计算预测(http://hulab.ucf.edu/research/projects/DmiRT/)。D-miRT非常适合整合低分辨率的表观遗传特征(DNase-Seq和组蛋白修饰数据)和高分辨率的序列特征。与不同训练数据集上的替代计算模型相比,D-miRT优于所有基线模型,并在条件特异性miRNA TSS预测任务中表现出高精度。与使用模型训练过程中未见过的细胞系进行细胞特异性miRNA TSS识别的最新方法相比,D-miRT也表现出卓越的性能。