Mounsey Andrew, Bauer Petra, Hope Ian A
School of Biology, University of Leeds, Leeds, LS2 9JT, United Kingdom.
Genome Res. 2002 May;12(5):770-5. doi: 10.1101/gr.208802.
Only a minority of the genes, identified in the Caenorhabditis elegans genome sequence data by computer analysis, have been characterized experimentally. We attempted to determine the expression patterns for a random sample of the annotated genes using reporter gene fusions. A low success rate was obtained for evolutionarily recently duplicated genes. Analysis of the data suggests that this is not due to conditional or low-level expression. The remaining explanation is that most of the annotated genes in the recently duplicated category are pseudogenes, a proportion corresponding to 20% of all of the annotated C. elegans genes. Further support for this surprisingly high figure was sought by comparing sequences for families of recently duplicated C. elegans genes. Although only a preliminary analysis, clear evidence for a gene having been recently inactivated by genetic drift was found for many genes in the recently duplicated category. At least 4% of the annotated C. elegans genes can be recognized as pseudogenes simply from closer inspection of the sequence data. Lessons learned in identifying pseudogenes in C. elegans could be of value in the annotation of the genomes of other species where, although there may be fewer pseudogenes, they may be harder to detect.
通过计算机分析在秀丽隐杆线虫基因组序列数据中鉴定出的基因中,只有少数已通过实验进行了表征。我们试图使用报告基因融合来确定注释基因随机样本的表达模式。对于进化上近期复制的基因,成功率较低。对数据的分析表明,这不是由于条件性表达或低水平表达所致。剩下的解释是,近期复制类别中的大多数注释基因都是假基因,这一比例相当于所有注释的秀丽隐杆线虫基因的20%。通过比较近期复制的秀丽隐杆线虫基因家族的序列,寻求对这一惊人高比例的进一步支持。尽管只是初步分析,但在近期复制类别中的许多基因中发现了基因最近因遗传漂变而失活的明确证据。仅通过对序列数据的更仔细检查,就可以将至少4%的注释秀丽隐杆线虫基因识别为假基因。在秀丽隐杆线虫中识别假基因所吸取的经验教训,可能对其他物种基因组的注释有价值,在这些物种中,虽然假基因可能较少,但可能更难检测到。