Kim Sun, Islamaj Doğan Rezarta, Chatr-Aryamontri Andrew, Chang Christie S, Oughtred Rose, Rust Jennifer, Batista-Navarro Riza, Carter Jacob, Ananiadou Sophia, Matos Sérgio, Santos André, Campos David, Oliveira José Luís, Singh Onkar, Jonnagaddala Jitendra, Dai Hong-Jie, Su Emily Chia-Yu, Chang Yung-Chun, Su Yu-Chen, Chu Chun-Han, Chen Chien Chin, Hsu Wen-Lian, Peng Yifan, Arighi Cecilia, Wu Cathy H, Vijay-Shanker K, Aydın Ferhat, Hüsünbeyi Zehra Melce, Özgür Arzucan, Shin Soo-Yong, Kwon Dongseop, Dolinski Kara, Tyers Mike, Wilbur W John, Comeau Donald C
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, QC H3C 3J7, Canada.
Database (Oxford). 2016 Sep 1;2016. doi: 10.1093/database/baw121. Print 2016.
BioC is a simple XML format for text, annotations and relations, and was developed to achieve interoperability for biomedical text processing. Following the success of BioC in BioCreative IV, the BioCreative V BioC track addressed a collaborative task to build an assistant system for BioGRID curation. In this paper, we describe the framework of the collaborative BioC task and discuss our findings based on the user survey. This track consisted of eight subtasks including gene/protein/organism named entity recognition, protein-protein/genetic interaction passage identification and annotation visualization. Using BioC as their data-sharing and communication medium, nine teams, world-wide, participated and contributed either new methods or improvements of existing tools to address different subtasks of the BioC track. Results from different teams were shared in BioC and made available to other teams as they addressed different subtasks of the track. In the end, all submitted runs were merged using a machine learning classifier to produce an optimized output. The biocurator assistant system was evaluated by four BioGRID curators in terms of practical usability. The curators' feedback was overall positive and highlighted the user-friendly design and the convenient gene/protein curation tool based on text mining.Database URL: http://www.biocreative.org/tasks/biocreative-v/track-1-bioc/.
BioC是一种用于文本、注释和关系的简单XML格式,开发它是为了实现生物医学文本处理的互操作性。继BioC在生物创意竞赛IV中取得成功之后,生物创意竞赛V的BioC赛道解决了一项协作任务,即构建一个用于BioGRID编目的辅助系统。在本文中,我们描述了协作性BioC任务的框架,并基于用户调查讨论了我们的发现。该赛道包括八个子任务,包括基因/蛋白质/生物体命名实体识别、蛋白质-蛋白质/基因相互作用段落识别和注释可视化。来自世界各地的九个团队以BioC作为他们的数据共享和交流媒介,参与并贡献了新方法或对现有工具的改进,以解决BioC赛道的不同子任务。不同团队的结果在BioC中共享,并在其他团队处理赛道的不同子任务时提供给他们。最后,所有提交的运行结果都使用机器学习分类器进行合并,以产生优化输出。四位BioGRID编目员对生物编目辅助系统的实际可用性进行了评估。编目员的反馈总体上是积极的,并突出了用户友好的设计以及基于文本挖掘的便捷基因/蛋白质编目工具。数据库网址:http://www.biocreative.org/tasks/biocreative-v/track-1-bioc/