a School of Computer Science and Technology , Harbin Institute of Technology Shenzhen Graduate School , Shenzhen , Guangdong 518055 , China.
b Key Laboratory of Network Oriented Intelligent Computation , Harbin Institute of Technology Shenzhen Graduate School , Shenzhen , Guangdong , China.
J Biomol Struct Dyn. 2016;34(1):223-35. doi: 10.1080/07391102.2015.1014422. Epub 2015 Mar 3.
A microRNA (miRNA) is a small non-coding RNA molecule, functioning in transcriptional and post-transcriptional regulation of gene expression. The human genome may encode over 1000 miRNAs. Albeit poorly characterized, miRNAs are widely deemed as important regulators of biological processes. Aberrant expression of miRNAs has been observed in many cancers and other disease states, indicating that they are deeply implicated with these diseases, particularly in carcinogenesis. Therefore, it is important for both basic research and miRNA-based therapy to discriminate the real pre-miRNAs from the false ones (such as hairpin sequences with similar stem-loops). Particularly, with the avalanche of RNA sequences generated in the post-genomic age, it is highly desired to develop computational sequence-based methods for effectively identifying the human pre-miRNAs. Here, we propose a predictor called "iMiRNA-PseDPC", in which the RNA sequences are formulated by a novel feature vector called "pseudo distance-pair composition" (PseDPC) with 10 types of structure statuses. Rigorous cross-validations on a much larger and more stringent newly constructed benchmark data-set showed that our approach has remarkably outperformed the existing ones in either prediction accuracy or efficiency, indicating the new predictor is quite promising or at least may become a complementary tool to the existing predictors in this area. For the convenience of most experimental scientists, a user-friendly web server for the new predictor has been established at http://bioinformatics.hitsz.edu.cn/iMiRNA-PseDPC/, by which users can easily get their desired results without the need to go through the mathematical details. It is anticipated that the new predictor may become a useful high throughput tool for genome analysis particularly in dealing with large-scale data.
微小 RNA(miRNA)是一种小的非编码 RNA 分子,在基因表达的转录和转录后调控中发挥作用。人类基因组可能编码超过 1000 种 miRNA。尽管 miRNA 的特征描述较差,但它们被广泛认为是重要的生物过程调节剂。在许多癌症和其他疾病状态中观察到 miRNA 的表达异常,表明它们与这些疾病密切相关,特别是在致癌作用中。因此,区分真正的前体 miRNA 和假的前体 miRNA(例如具有相似茎环的发夹序列)对于基础研究和基于 miRNA 的治疗都很重要。特别是,在后基因组时代产生的大量 RNA 序列中,非常需要开发基于计算序列的方法来有效地识别人类前体 miRNA。在这里,我们提出了一个名为“iMiRNA-PseDPC”的预测器,其中 RNA 序列通过一种新的特征向量“伪距离-对组成”(PseDPC)进行构建,该特征向量具有 10 种结构状态。在一个更大和更严格的新构建的基准数据集上进行的严格交叉验证表明,我们的方法在预测准确性或效率方面都明显优于现有的方法,表明新的预测器非常有前途,或者至少可以成为该领域现有预测器的补充工具。为了方便大多数实验科学家,我们在 http://bioinformatics.hitsz.edu.cn/iMiRNA-PseDPC/ 上建立了一个新的预测器的用户友好型网络服务器,用户可以轻松地获得他们所需的结果,而无需了解数学细节。预计新的预测器可能成为基因组分析的有用高通量工具,特别是在处理大规模数据时。