Parra Genis, Bradnam Keith, Ning Zemin, Keane Thomas, Korf Ian
UC Davis Genome Center, University of California Davis, Davis, CA, USA.
Nucleic Acids Res. 2009 Jan;37(1):289-97. doi: 10.1093/nar/gkn916. Epub 2008 Nov 28.
Genome sequencing projects have been initiated for a wide range of eukaryotes. A few projects have reached completion, but most exist as draft assemblies. As one of the main reasons to sequence a genome is to obtain its catalog of genes, an important question is how complete or completable the catalog is in unfinished genomes. To answer this question, we have identified a set of core eukaryotic genes (CEGs), that are extremely highly conserved and which we believe are present in low copy numbers in higher eukaryotes. From an analysis of a phylogenetically diverse set of eukaryotic genome assemblies, we found that the proportion of CEGs mapped in draft genomes provides a useful metric for describing the gene space, and complements the commonly used N50 length and x-fold coverage values.
已经针对多种真核生物启动了基因组测序项目。一些项目已经完成,但大多数是以草图组装的形式存在。由于对基因组进行测序的主要原因之一是获取其基因目录,一个重要的问题是在未完成的基因组中该目录的完整程度或可完成程度如何。为了回答这个问题,我们确定了一组核心真核基因(CEGs),它们极其高度保守,并且我们认为在高等真核生物中以低拷贝数存在。通过对一组系统发育多样的真核生物基因组组装进行分析,我们发现草图基因组中映射的CEGs比例为描述基因空间提供了一个有用的指标,并且补充了常用的N50长度和x倍覆盖值。