Suppr超能文献

众包与手动基因注释的可行性:秀丽隐杆线虫中的初步研究。

Crowdsourcing and the feasibility of manual gene annotation: A pilot study in the nematode Pristionchus pacificus.

机构信息

Max Planck Institute for Developmental Biology, Department for Integrative Evolutionary Biology, Max-Planck-Ring 9, 72076, Tübingen, Germany.

出版信息

Sci Rep. 2019 Dec 11;9(1):18789. doi: 10.1038/s41598-019-55359-5.

Abstract

Nematodes such as Caenorhabditis elegans are powerful systems to study basically all aspects of biology. Their species richness together with tremendous genetic knowledge from C. elegans facilitate the evolutionary study of biological functions using reverse genetics. However, the ability to identify orthologs of candidate genes in other species can be hampered by erroneous gene annotations. To improve gene annotation in the nematode model organism Pristionchus pacificus, we performed a genome-wide screen for C. elegans genes with potentially incorrectly annotated P. pacificus orthologs. We initiated a community-based project to manually inspect more than two thousand candidate loci and to propose new gene models based on recently generated Iso-seq and RNA-seq data. In most cases, misannotation of C. elegans orthologs was due to artificially fused gene predictions and completely missing gene models. The community-based curation raised the gene count from 25,517 to 28,036 and increased the single copy ortholog completeness level from 86% to 97%. This pilot study demonstrates how even small-scale crowdsourcing can drastically improve gene annotations. In future, similar approaches can be used for other species, gene sets, and even larger communities thus making manual annotation of large parts of the genome feasible.

摘要

线虫,如秀丽隐杆线虫,是研究生物学各个方面的强大系统。它们的物种丰富度,再加上秀丽隐杆线虫的大量遗传知识,使得使用反向遗传学研究生物功能变得更加容易。然而,在其他物种中识别候选基因的直系同源物的能力可能会受到错误基因注释的阻碍。为了改进线虫模式生物 Pristionchus pacificus 的基因注释,我们进行了一次全基因组筛选,以寻找可能被错误注释的 P. pacificus 直系同源物的秀丽隐杆线虫基因。我们启动了一个基于社区的项目,手动检查了两千多个候选基因座,并根据最近生成的 Iso-seq 和 RNA-seq 数据提出了新的基因模型。在大多数情况下,秀丽隐杆线虫直系同源物的错误注释是由于人为融合的基因预测和完全缺失的基因模型造成的。基于社区的校对将基因数量从 25517 个增加到 28036 个,单拷贝直系同源物的完整性水平从 86%提高到 97%。这项初步研究表明,即使是小规模的众包也可以极大地改进基因注释。在未来,类似的方法可以用于其他物种、基因集,甚至更大的社区,从而使得对基因组的大部分进行手动注释成为可能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4bf/6906410/17bbd2851d07/41598_2019_55359_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验