Oulas Anastasis, Boutla Alexandra, Gkirtzou Katerina, Reczko Martin, Kalantidis Kriton, Poirazi Panayiota
Institute of Molecular Biology and Biotechnology-FORTH, Heraklion, University of Crete, Heraklion, Greece.
Nucleic Acids Res. 2009 Jun;37(10):3276-87. doi: 10.1093/nar/gkp120. Epub 2009 Mar 25.
The majority of existing computational tools rely on sequence homology and/or structural similarity to identify novel microRNA (miRNA) genes. Recently supervised algorithms are utilized to address this problem, taking into account sequence, structure and comparative genomics information. In most of these studies miRNA gene predictions are rarely supported by experimental evidence and prediction accuracy remains uncertain. In this work we present a new computational tool (SSCprofiler) utilizing a probabilistic method based on Profile Hidden Markov Models to predict novel miRNA precursors. Via the simultaneous integration of biological features such as sequence, structure and conservation, SSCprofiler achieves a performance accuracy of 88.95% sensitivity and 84.16% specificity on a large set of human miRNA genes. The trained classifier is used to identify novel miRNA gene candidates located within cancer-associated genomic regions and rank the resulting predictions using expression information from a full genome tiling array. Finally, four of the top scoring predictions are verified experimentally using northern blot analysis. Our work combines both analytical and experimental techniques to show that SSCprofiler is a highly accurate tool which can be used to identify novel miRNA gene candidates in the human genome. SSCprofiler is freely available as a web service at http://www.imbb.forth.gr/SSCprofiler.html.
大多数现有的计算工具依靠序列同源性和/或结构相似性来识别新的微小RNA(miRNA)基因。最近,考虑到序列、结构和比较基因组学信息,有监督算法被用于解决这个问题。在大多数这些研究中,miRNA基因预测很少得到实验证据的支持,预测准确性仍然不确定。在这项工作中,我们提出了一种新的计算工具(SSCprofiler),它利用基于Profile隐马尔可夫模型的概率方法来预测新的miRNA前体。通过同时整合序列、结构和保守性等生物学特征,SSCprofiler在一大组人类miRNA基因上实现了88.95%的灵敏度和84.16%的特异性的性能准确率。经过训练的分类器用于识别位于癌症相关基因组区域内的新的miRNA基因候选物,并使用来自全基因组平铺阵列的表达信息对所得预测进行排序。最后,使用Northern印迹分析对得分最高的四个预测进行了实验验证。我们的工作结合了分析和实验技术,表明SSCprofiler是一种高度准确的工具,可用于识别人类基因组中的新的miRNA基因候选物。SSCprofiler可作为网络服务在http://www.imbb.forth.gr/SSCprofiler.html上免费获得。