Department for Integrative Evolutionary Biology, Max Planck Institute for Developmental Biology, Max-Planck-Ring 9, 72076, Tübingen, Germany.
BMC Genomics. 2020 Oct 12;21(1):708. doi: 10.1186/s12864-020-07100-0.
Nematode model organisms such as Caenorhabditis elegans and Pristionchus pacificus are powerful systems for studying the evolution of gene function at a mechanistic level. However, the identification of P. pacificus orthologs of candidate genes known from C. elegans is complicated by the discrepancy in the quality of gene annotations, a common problem in nematode and invertebrate genomics.
Here, we combine comparative genomic screens for suspicious gene models with community-based curation to further improve the quality of gene annotations in P. pacificus. We extend previous curations of one-to-one orthologs to larger gene families and also orphan genes. Cross-species comparisons of protein lengths, screens for atypical domain combinations and species-specific orphan genes resulted in 4311 candidate genes that were subject to community-based curation. Corrections for 2946 gene models were implemented in a new version of the P. pacificus gene annotations. The new set of gene annotations contains 28,896 genes and has a single copy ortholog completeness level of 97.6%.
Our work demonstrates the effectiveness of comparative genomic screens to identify suspicious gene models and the scalability of community-based approaches to improve the quality of thousands of gene models. Similar community-based approaches can help to improve the quality of gene annotations in other invertebrate species, including parasitic nematodes.
秀丽隐杆线虫和太平洋奇杆线虫等线虫模式生物是研究基因功能进化的有力系统,可从机制层面进行研究。然而,由于线虫和无脊椎动物基因组学中常见的基因注释质量差异,候选基因在太平洋奇杆线虫中的同源基因的鉴定变得复杂。
在这里,我们将可疑基因模型的比较基因组筛选与基于社区的策展相结合,以进一步提高太平洋奇杆线虫基因注释的质量。我们将之前的一对一同源基因的策展扩展到更大的基因家族和孤儿基因。跨物种比较蛋白长度、非典型结构域组合的筛选和物种特异性的孤儿基因导致了 4311 个候选基因,这些基因接受基于社区的策展。在太平洋奇杆线虫基因注释的新版本中实施了对 2946 个基因模型的修正。新的基因注释集包含 28896 个基因,单个拷贝直系同源物完整度水平为 97.6%。
我们的工作表明,比较基因组筛选可有效识别可疑基因模型,基于社区的方法可大规模提高数千个基因模型的质量。类似的基于社区的方法可以帮助提高其他无脊椎动物物种(包括寄生线虫)的基因注释质量。