Indiana University School of Informatics, Indiana University-Purdue University and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
Curr Protein Pept Sci. 2011 Sep;12(6):503-7. doi: 10.2174/138920311796957667.
Evidence is accumulating that small open reading frames (sORF, <100 codons) play key roles in many important biological processes. Yet, they are generally ignored in gene annotation despite they are far more abundant than the genes with more than 100 codons. Here, we demonstrate that popular homolog search and codon-index techniques perform poorly for small genes relative to that for larger genes, while a method dedicated to sORF discovery has a similar level of accuracy as homology search. The result is largely due to the small dataset of experimentally verified sORF available for homology search and for training ab initio techniques. It highlights the urgent need for both experimental and computational studies in order to further advance the accuracy of sORF prediction.
越来越多的证据表明,小开放阅读框(sORF,<100 个密码子)在许多重要的生物学过程中发挥着关键作用。然而,尽管它们的数量远远超过 100 个密码子的基因,但在基因注释中通常被忽略。在这里,我们证明相对于较大的基因,流行的同源搜索和密码子索引技术在小基因方面的性能较差,而专门用于发现 sORF 的方法的准确性与同源搜索相当。这一结果主要是由于用于同源搜索和从头预测技术训练的实验验证的 sORF 的小数据集。这突出表明需要进行实验和计算研究,以进一步提高 sORF 预测的准确性。