Borodovsky Mark, Lomsadze Alex
Georgia Institute of Technology, Atlanta, Georgia.
Curr Protoc Microbiol. 2014 Feb 6;32:Unit 1E.7.. doi: 10.1002/9780471729259.mc01e07s32.
This unit describes how to use several gene-finding programs from the GeneMark line developed for finding protein-coding ORFs in genomic DNA of prokaryotic species, in genomic DNA of eukaryotic species with intronless genes, in genomes of viruses and phages, and in prokaryotic metagenomic sequences, as well as in EST sequences with spliced-out introns. These bioinformatics tools were demonstrated to have state-of-the-art accuracy, and have been frequently used for gene annotation in novel nucleotide sequences. An additional advantage of these sequence-analysis tools is that the problem of algorithm parameterization is solved automatically, with parameters estimated by iterative self-training (unsupervised training).
本单元介绍了如何使用GeneMark系列中的几个基因查找程序,这些程序用于在原核生物的基因组DNA、无内含子基因的真核生物基因组DNA、病毒和噬菌体基因组以及原核宏基因组序列中查找蛋白质编码开放阅读框(ORF),也可用于处理去除内含子后的表达序列标签(EST)序列。这些生物信息学工具已被证明具有最先进的准确性,并经常用于新核苷酸序列的基因注释。这些序列分析工具的另一个优点是,算法参数化问题通过迭代自训练(无监督训练)自动估计参数得以解决。