Hoff K J, Stanke M
Institut für Mathematik und Informatik, Universität Greifswald, Walther-Rathenau-Str. 47, 17487 Greifswald, Germany.
Curr Opin Insect Sci. 2015 Feb;7:8-14. doi: 10.1016/j.cois.2015.02.008. Epub 2015 Mar 7.
We review software tools for gene prediction - the identification of protein-coding genes and their structure in genome sequences. The discussed approaches include methods based on RNA-Seq and current methods based on homology - comparative gene prediction and protein spliced alignments. Many methods require that their parameters are adjusted to the target species or its broader clade. These include ab initio gene finders, integrated approaches with ab initio components and some aligners. We also review current automatic methods for training for the common case that a bona fide training set of gene structures is not available before annotation.
我们回顾了用于基因预测的软件工具——即在基因组序列中识别蛋白质编码基因及其结构。所讨论的方法包括基于RNA测序的方法以及当前基于同源性的方法——比较基因预测和蛋白质剪接比对。许多方法要求将其参数调整至目标物种或其更广泛的进化枝。这些方法包括从头开始的基因预测工具、带有从头开始组件的综合方法以及一些比对工具。我们还回顾了当前在注释之前没有真正的基因结构训练集这种常见情况下的自动训练方法。