Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America.
Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America.
PLoS Comput Biol. 2021 Oct 28;17(10):e1009463. doi: 10.1371/journal.pcbi.1009463. eCollection 2021 Oct.
Experimental data about gene functions curated from the primary literature have enormous value for research scientists in understanding biology. Using the Gene Ontology (GO), manual curation by experts has provided an important resource for studying gene function, especially within model organisms. Unprecedented expansion of the scientific literature and validation of the predicted proteins have increased both data value and the challenges of keeping pace. Capturing literature-based functional annotations is limited by the ability of biocurators to handle the massive and rapidly growing scientific literature. Within the community-oriented wiki framework for GO annotation called the Gene Ontology Normal Usage Tracking System (GONUTS), we describe an approach to expand biocuration through crowdsourcing with undergraduates. This multiplies the number of high-quality annotations in international databases, enriches our coverage of the literature on normal gene function, and pushes the field in new directions. From an intercollegiate competition judged by experienced biocurators, Community Assessment of Community Annotation with Ontologies (CACAO), we have contributed nearly 5,000 literature-based annotations. Many of those annotations are to organisms not currently well-represented within GO. Over a 10-year history, our community contributors have spurred changes to the ontology not traditionally covered by professional biocurators. The CACAO principle of relying on community members to participate in and shape the future of biocuration in GO is a powerful and scalable model used to promote the scientific enterprise. It also provides undergraduate students with a unique and enriching introduction to critical reading of primary literature and acquisition of marketable skills.
从主要文献中整理出的关于基因功能的实验数据对于研究科学家理解生物学具有巨大的价值。使用基因本体论 (GO),专家的手动整理为研究基因功能提供了一个重要的资源,特别是在模式生物中。科学文献的空前扩展和预测蛋白的验证增加了数据的价值和保持同步的挑战。基于文献的功能注释的捕获受到生物注释员处理海量且快速增长的科学文献的能力的限制。在称为基因本体论正常使用跟踪系统 (GONUTS) 的面向社区的 GO 注释维基框架内,我们描述了一种通过众包与大学生一起扩展生物注释的方法。这使国际数据库中的高质量注释数量增加了一倍,丰富了我们对正常基因功能文献的覆盖范围,并推动了该领域的新方向。从由经验丰富的生物注释员评判的校际竞赛——基于本体的社区评估 (CACAO) 中,我们贡献了近 5000 个基于文献的注释。这些注释中有许多是针对目前在 GO 中没有很好代表的生物体的。在 10 年的历史中,我们的社区贡献者促使本体发生了变化,这些变化传统上不是由专业生物注释员覆盖的。CACAO 原则依赖于社区成员参与并塑造 GO 中生物注释的未来,这是一个强大且可扩展的模型,用于促进科学事业。它还为大学生提供了一个独特而丰富的机会,让他们批判性地阅读主要文献并获得有市场价值的技能。