Seyeddokht Atefeh, Aslaminejad Ali Asghar, Masoudi-Nejad Ali, Nassiri Mohammadreza, Zahiri Javad, Sadeghi Balal
Department of Animal Science, Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad, Iran.
Laboratory of System Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
Avicenna J Med Biotechnol. 2016 Jan-Mar;8(1):36-41.
Piwi-interacting RNAs (piRNAs) are small non-coding RNAs (ncRNAs), with a length of about 24-32 nucleotides, which have been discovered recently. These ncRNAs play an important role in germline development, transposon silencing, epigenetic regulation, protecting the genome from invasive transposable elements, and the pathophysiology of diseases such as cancer. piRNA identification is challenging due to the lack of conserved piRNA sequences and structural elements.
To detect piRNAs, an appropriate feature set, including 8 diverse feature groups to encode each RNA was applied. In addition, a Support Vector Machine (SVM) classifier was used with optimized parameters for RNA classification. According to the obtained results, the classification performance using the optimized feature subsets was much higher than the one in previously published studies.
Our results revealed 98% accuracy, Mathew' correlation coefficient of 98% and 99% specificity in discriminating piRNAs from the other RNAs. Also, the obtained results show that the proposed method outperforms its competitors.
In this paper, a prediction method was proposed to identify piRNA in human. Also, 48 heterogeneous features (sequence and structural features) were used to encode RNAs. To assess the performance of the method, a benchmark dataset containing 515 piRNAs and 1206 types of other RNAs was constructed. Our method reached the accuracy of 99% on the benchmark dataset. Also, our analysis revealed that the structural features are the most contributing features in piRNA prediction.
Piwi相互作用RNA(piRNA)是一类小的非编码RNA(ncRNA),长度约为24 - 32个核苷酸,是最近才被发现的。这些ncRNA在生殖细胞发育、转座子沉默、表观遗传调控、保护基因组免受侵入性转座元件影响以及癌症等疾病的病理生理学中发挥着重要作用。由于缺乏保守的piRNA序列和结构元件,piRNA的鉴定具有挑战性。
为了检测piRNA,应用了一个合适的特征集,包括8个不同的特征组来编码每个RNA。此外,使用支持向量机(SVM)分类器并对RNA分类的参数进行了优化。根据获得的结果,使用优化后的特征子集进行分类的性能比先前发表的研究中的性能要高得多。
我们的结果显示在区分piRNA与其他RNA时,准确率为98%,马修相关系数为98%,特异性为99%。此外,获得的结果表明所提出的方法优于其竞争对手。
本文提出了一种预测方法来鉴定人类中的piRNA。此外,使用了48个异构特征(序列和结构特征)来编码RNA。为了评估该方法的性能,构建了一个包含515个piRNA和1206种其他RNA类型的基准数据集。我们的方法在基准数据集上达到了99%的准确率。此外,我们的分析表明结构特征是piRNA预测中最具贡献的特征。