Shin Soo-Yong, Kim Sun, Wilbur W John, Kwon Dongseop
Department of Biomedical Informatics, Asan Medical Center, Seoul 05505, Korea.
National Center for Biotechnology Information, National Library of Medicine, National Institute of Health, Bethesda, MD 20894, USA.
Database (Oxford). 2016 Aug 10;2016. doi: 10.1093/database/baw106. Print 2016.
BioC is an XML-based format designed to provide interoperability for text mining tools and manual curation results. A challenge of BioC as a standard format is to align annotations from multiple systems. Ideally, this should not be a major problem if users follow guidelines given by BioC key files. Nevertheless, the misalignment between text and annotations happens quite often because different systems tend to use different software development environments, e.g. ASCII vs. Unicode. We first implemented the BioC Viewer to assist BioGRID curators as a part of the BioCreative V BioC track (Collaborative Biocurator Assistant Task). For the BioC track, the BioC Viewer helped curate protein-protein interaction and genetic interaction pairs appearing in full-text articles. Here, we describe the BioC Viewer itself as well as improvements made to the BioC Viewer since the BioCreative V Workshop to address the misalignment issue of BioC annotations. While uploading BioC files, a BioC merge process is offered when there are files from the same full-text article. If there is a mismatch between an annotated offset and text, the BioC Viewer adjusts the offset to correctly align with the text. The BioC Viewer has a user-friendly interface, where most operations can be performed within a few mouse clicks. The feedback from BioGRID curators has been positive for the web interface, particularly for its usability and learnability.Database URL: http://viewer.bioqrator.org.
BioC是一种基于XML的格式,旨在为文本挖掘工具和人工编目结果提供互操作性。BioC作为一种标准格式面临的一个挑战是对齐来自多个系统的注释。理想情况下,如果用户遵循BioC关键文件给出的指导原则,这不应是一个主要问题。然而,文本与注释之间的错位经常发生,因为不同的系统倾向于使用不同的软件开发环境,例如ASCII与Unicode。作为生物创意V BioC赛道(协作生物编目助手任务)的一部分,我们首先实现了BioC查看器来协助BioGRID编目人员。对于BioC赛道,BioC查看器帮助编目全文文章中出现的蛋白质-蛋白质相互作用和基因相互作用对。在这里,我们描述了BioC查看器本身以及自生物创意V研讨会以来对BioC查看器所做的改进,以解决BioC注释的错位问题。上传BioC文件时,如果有来自同一全文文章的文件,则会提供BioC合并过程。如果注释偏移量与文本不匹配,BioC查看器会调整偏移量以使其与文本正确对齐。BioC查看器具有用户友好的界面,大多数操作只需点击几下鼠标即可完成。BioGRID编目人员对该网络界面的反馈是积极的,特别是对其可用性和易学性。数据库网址:http://viewer.bioqrator.org。