生物创意IV基因本体任务概述。

Overview of the gene ontology task at BioCreative IV.

作者信息

Mao Yuqing, Van Auken Kimberly, Li Donghui, Arighi Cecilia N, McQuilton Peter, Hayman G Thomas, Tweedie Susan, Schaeffer Mary L, Laulederkind Stanley J F, Wang Shur-Jen, Gobeill Julien, Ruch Patrick, Luu Anh Tuan, Kim Jung-Jae, Chiang Jung-Hsien, Chen Yu-De, Yang Chia-Jung, Liu Hongfang, Zhu Dongqing, Li Yanpeng, Yu Hong, Emadzadeh Ehsan, Gonzalez Graciela, Chen Jian-Ming, Dai Hong-Jie, Lu Zhiyong

机构信息

National Center for Biotechnology Information (NCBI), National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20817, USA WormBase, Division of Biology, California Institute of Technology, 1200 E. California Boulevard, Pasadena, CA 91125, USA, TAIR, Department of Plant Biology, The Arabidopsis Information Resource, Carnegie Institution for Science, Stanford, CA 94305, USA, Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, DE 19711, USA, FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK, Rat Genome Database, Human and Molecular Genetics Center, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53226, USA, USDA-ARS Plant Genetics Research Unit and Division of Plant Sciences, Department of Agronomy, University of Missouri, Columbia, MO 65211, USA, HES-SO, HEG, Library and Information Sciences, 7 route de Drize, CH-1227 Carouge, Switzerland, SIBtex, Swiss Institute of Bioinformatics, Rue Michel Servet 1, 1211 Geneva 4, Switzerland, School of Computer Engineering, Nanyang Technological University, Block N4, #02a-32, Nanyang Avenue, Singapore 639798, Department of Computer Science and Information Engineering, National Cheng-Kung University, No. 1, University Rd., Tainan 701, Taiwan, Republic of China, Department of Radiology, Mackay Memorial Hospital, Taitung Branch, Lane 303 Chang Sha St. Taitung, Taiwan, Republic of China, Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA, Department of Computer Science, University of Delaware, 101 Smith Hall, Newark, DE 19716, USA, Department of Quantitative Health Sciences, University of Massachusetts Medical School, 55 Lake Avenue North (AC7-059), Worcester, MA 01655 USA, Department of Biomedical Informatics, Arizona State University, 13212 East Shea Boulevard Scottsdale, AZ 85259 USA, Institute of Information Science, Academia Sinica, 128 Academia Road, Secti

出版信息

Database (Oxford). 2014 Aug 25;2014. doi: 10.1093/database/bau086. Print 2014.

DOI:10.1093/database/bau086

PMID:25157073

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4142793/

Abstract

UNLABELLED

Gene ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation.

DATABASE URL

http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/.

摘要

未标注

基因本体（GO）注释是模式生物数据库（MODs）中的一项常见任务，用于从期刊文章中获取基因功能数据。这是一项耗时且费力的任务，因此常被视为文献编目的瓶颈之一。对半自动化或全自动化的GO编目技术的需求日益增长，这些技术将帮助数据库编目人员在全长文章中快速准确地识别基因功能信息。尽管过去进行了多次尝试，但很少有研究被证明对实际的GO编目有帮助。句子级训练数据的短缺以及文本挖掘开发者与GO编目人员之间缺乏互动机会，限制了算法开发的进展以及在实际情况中的相应应用。为此，我们在生物创意IV中组织了一项基于文献的GO注释文本挖掘挑战任务。更具体地说，我们开发了两个子任务：（i）自动定位包含与GO相关信息的文本段落（文本检索任务）和（ii）自动识别给定文章中基因的相关GO术语（概念识别任务）。在五个MODs的支持下，我们为各团队提供了4000多个独特的文本段落，作为我们任务数据中每个GO注释的基础。这种证据文本信息长期以来被认为对文本挖掘算法开发至关重要，但由于编目成本高昂而从未提供过。总共有七个团队参加了挑战任务。从团队结果来看，我们得出结论，在过去十年中，从文献中自动挖掘GO术语的技术水平有所提高，但计算机辅助GO编目仍有很大的进步空间。未来的工作应侧重于解决剩余的技术挑战，以提高自动GO概念识别的性能，并将文本挖掘工具的实际优势纳入实际的GO注释中。

数据库网址

http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8409/4142793/07b587a54c10/bau086f1p.jpg

相似文献

Overview of the gene ontology task at BioCreative IV.

Database (Oxford). 2014 Aug 25;2014. doi: 10.1093/database/bau086. Print 2014.

BC4GO: a full-text corpus for the BioCreative IV GO task.

Database (Oxford). 2014 Jul 28;2014. doi: 10.1093/database/bau074. Print 2014.

Overview of the BioCreative III Workshop.

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-12-S8-S1.

Evaluation of BioCreAtIvE assessment of task 2.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-6-S1-S16. Epub 2005 May 24.

An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-6-S1-S17. Epub 2005 May 24.

Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.

Database (Oxford). 2019 Jan 1;2019:bay147. doi: 10.1093/database/bay147.

Closing the loop: from paper to protein annotation using supervised Gene Ontology classification.

Database (Oxford). 2014 Sep 4;2014. doi: 10.1093/database/bau088. Print 2014.

BioCreative III interactive task: an overview.

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-12-S8-S4.

The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text.

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S3. doi: 10.1186/1471-2105-12-S8-S3.

Overview of the protein-protein interaction annotation extraction task of BioCreative II.

Genome Biol. 2008;9 Suppl 2(Suppl 2):S4. doi: 10.1186/gb-2008-9-s2-s4. Epub 2008 Sep 1.

引用本文的文献

A conceptual framework for human-AI collaborative genome annotation.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf377.

Characterization and automated classification of sentences in the biomedical literature: a case study for biocuration of gene expression and protein kinase activity.

bioRxiv. 2025 Jan 8:2025.01.06.631539. doi: 10.1101/2025.01.06.631539.

Integration of background knowledge for automatic detection of inconsistencies in gene ontology annotation.

Bioinformatics. 2024 Jun 28;40(Suppl 1):i390-i400. doi: 10.1093/bioinformatics/btae246.

Epigenetic changes in sperm are associated with paternal and child quantitative autistic traits in an autism-enriched cohort.

Mol Psychiatry. 2024 Jan;29(1):43-53. doi: 10.1038/s41380-023-02046-7. Epub 2023 Apr 27.

Automatic consistency assurance for literature-based gene ontology annotation.

BMC Bioinformatics. 2021 Nov 25;22(1):565. doi: 10.1186/s12859-021-04479-9.

A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.

J Med Internet Res. 2021 Aug 9;23(8):e28229. doi: 10.2196/28229.

ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts.

Front Res Metr Anal. 2021 Jul 13;6:674205. doi: 10.3389/frma.2021.674205. eCollection 2021.

Biomarker identification of hepatocellular carcinoma using a methodical literature mining strategy.

Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax082.

Function Prediction for G Protein-Coupled Receptors through Text Mining and Induction Matrix Completion.

ACS Omega. 2019 Feb 12;4(2):3045-3054. doi: 10.1021/acsomega.8b02454. eCollection 2019 Feb 28.

Accelerating annotation of articles via automated approaches: evaluation of the neXtA5 curation-support tool by neXtProt.

Database (Oxford). 2018 Jan 1;2018:bay129. doi: 10.1093/database/bay129.

本文引用的文献

Comparison and combination of several MeSH indexing approaches.

AMIA Annu Symp Proc. 2013 Nov 16;2013:709-18. eCollection 2013.

BioC: a minimalist approach to interoperability for biomedical text processing.

Database (Oxford). 2013 Sep 18;2013:bat064. doi: 10.1093/database/bat064. Print 2013.

The COMBREX project: design, methodology, and initial results.

PLoS Biol. 2013;11(8):e1001638. doi: 10.1371/journal.pbio.1001638. Epub 2013 Aug 27.

A guide to best practices for Gene Ontology (GO) manual annotation.

Database (Oxford). 2013 Jul 9;2013:bat054. doi: 10.1093/database/bat054. Print 2013.

Managing the data deluge: data-driven GO category assignment improves while complexity of functional annotation increases.

Database (Oxford). 2013 Jul 9;2013:bat041. doi: 10.1093/database/bat041. Print 2013.

GeneRIF indexing: sentence selection based on machine learning.

BMC Bioinformatics. 2013 May 31;14:171. doi: 10.1186/1471-2105-14-171.

PubTator: a web-based text mining tool for assisting biocuration.

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W518-22. doi: 10.1093/nar/gkt441. Epub 2013 May 22.

Large-scale event extraction from literature with multi-level gene normalization.

PLoS One. 2013 Apr 17;8(4):e55814. doi: 10.1371/journal.pone.0055814. Print 2013.

Use of Gene Ontology Annotation to understand the peroxisome proteome in humans.

Database (Oxford). 2013 Jan 17;2013:bas062. doi: 10.1093/database/bas062. Print 2013.

An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.

Database (Oxford). 2013 Jan 17;2013:bas056. doi: 10.1093/database/bas056. Print 2013.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

生物创意IV基因本体任务概述。

Overview of the gene ontology task at BioCreative IV.

作者信息

机构信息

出版信息

UNLABELLED

DATABASE URL

未标注

数据库网址

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献