Department of Computer Engineering, University of Peradeniya, Peradeniya, 20400, Sri Lanka.
School of Computing, University of North Florida, Jacksonville, FL, 32224, United States.
Biosystems. 2022 Jun;215-216:104662. doi: 10.1016/j.biosystems.2022.104662. Epub 2022 Mar 16.
microRNAs (miRNAs) are known as one of the small non-coding RNA molecules that control the expression of genes at the RNA level, while some operate at the DNA level. They typically range from 20 to 24 nucleotides in length and can be found in the plant and animal kingdoms as well as in some viruses. Computational approaches have overcome the limitations of the experimental methods and have performed well in identifying miRNAs. Compared to mature miRNAs, precursor miRNAs (pre-miRNAs) are long and have a hairpin loop structure with structural features. Therefore, most in-silico tools are implemented for pre-miRNA identification. This study presents a multilayer perceptron (MLP) based classifier implemented using 180 features under sequential, structural, and thermodynamic feature categories for plant pre-miRNA identification. This classifier has a 92% accuracy, a 94% specificity, and a 90% sensitivity. We have further tested this model with other small non-coding RNA types and obtained 78% accuracy. Furthermore, we introduce a novel dataset to train and test machine learning models, addressing the overlapping data issue in the positive training and testing datasets presented in PlantMiRNAPred for the classification of real and pseudo-plant pre-miRNAs. The new dataset and the classifier that can be used with any plant species are deployed on a web server freely accessible at http://mirnafinder.shyaman.me/.
miRNAs(microRNAs)是一种小的非编码 RNA 分子,已知可以在 RNA 水平上控制基因的表达,而有些则在 DNA 水平上发挥作用。它们通常长 20 到 24 个核苷酸,可以在植物和动物王国以及一些病毒中找到。计算方法克服了实验方法的局限性,在 miRNA 的识别方面表现出色。与成熟的 miRNAs 相比,前体 miRNAs(pre-miRNAs)较长,具有发夹环结构和结构特征。因此,大多数计算机工具都是针对 pre-miRNA 的识别而开发的。本研究提出了一种基于多层感知器(MLP)的分类器,该分类器使用顺序、结构和热力学特征类别中的 180 个特征来识别植物 pre-miRNA。该分类器的准确率为 92%,特异性为 94%,灵敏度为 90%。我们进一步用其他小非编码 RNA 类型对该模型进行了测试,得到了 78%的准确率。此外,我们引入了一个新的数据集来训练和测试机器学习模型,解决了 PlantMiRNAPred 中用于真核和拟核 pre-miRNA 分类的阳性训练和测试数据集的重叠数据问题。新数据集和可以与任何植物物种一起使用的分类器已在免费的网络服务器上部署,网址为 http://mirnafinder.shyaman.me/。