National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA.
School of Software Convergence, Myongji University, Seoul 03674, South Korea.
Nucleic Acids Res. 2020 Jul 2;48(W1):W5-W11. doi: 10.1093/nar/gkaa333.
Manually annotated data is key to developing text-mining and information-extraction algorithms. However, human annotation requires considerable time, effort and expertise. Given the rapid growth of biomedical literature, it is paramount to build tools that facilitate speed and maintain expert quality. While existing text annotation tools may provide user-friendly interfaces to domain experts, limited support is available for figure display, project management, and multi-user team annotation. In response, we developed TeamTat (https://www.teamtat.org), a web-based annotation tool (local setup available), equipped to manage team annotation projects engagingly and efficiently. TeamTat is a novel tool for managing multi-user, multi-label document annotation, reflecting the entire production life cycle. Project managers can specify annotation schema for entities and relations and select annotator(s) and distribute documents anonymously to prevent bias. Document input format can be plain text, PDF or BioC (uploaded locally or automatically retrieved from PubMed/PMC), and output format is BioC with inline annotations. TeamTat displays figures from the full text for the annotator's convenience. Multiple users can work on the same document independently in their workspaces, and the team manager can track task completion. TeamTat provides corpus quality assessment via inter-annotator agreement statistics, and a user-friendly interface convenient for annotation review and inter-annotator disagreement resolution to improve corpus quality.
人工标注数据对于开发文本挖掘和信息提取算法至关重要。然而,人工标注需要大量的时间、精力和专业知识。鉴于生物医学文献的快速增长,构建有助于提高速度并保持专家质量的工具至关重要。虽然现有的文本标注工具可能为领域专家提供用户友好的界面,但对于图形显示、项目管理和多用户团队标注的支持有限。有鉴于此,我们开发了 TeamTat(https://www.teamtat.org),这是一个基于网络的标注工具(可本地设置),能够吸引人且高效地管理团队标注项目。TeamTat 是一种用于管理多用户、多标签文档标注的新型工具,反映了整个生产生命周期。项目经理可以为实体和关系指定标注方案,并选择标注人员并匿名分发文档,以防止偏见。文档输入格式可以是纯文本、PDF 或 BioC(本地上传或自动从 PubMed/PMC 检索),输出格式为带有内联标注的 BioC。TeamTat 为标注人员方便地显示全文中的图形。多个用户可以在其工作区中独立处理同一文档,团队经理可以跟踪任务完成情况。TeamTat 通过注释者间的一致性统计数据提供语料库质量评估,并提供用户友好的界面,方便注释审查和注释者间分歧解决,以提高语料库质量。