Pérez-Pérez Martín, Glez-Peña Daniel, Fdez-Riverola Florentino, Lourenço Anália
ESEI - Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain(1).
ESEI - Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain(1); Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal.
Comput Methods Programs Biomed. 2015 Feb;118(2):242-51. doi: 10.1016/j.cmpb.2014.11.005. Epub 2014 Nov 25.
Document annotation is a key task in the development of Text Mining methods and applications. High quality annotated corpora are invaluable, but their preparation requires a considerable amount of resources and time. Although the existing annotation tools offer good user interaction interfaces to domain experts, project management and quality control abilities are still limited. Therefore, the current work introduces Marky, a new Web-based document annotation tool equipped to manage multi-user and iterative projects, and to evaluate annotation quality throughout the project life cycle.
At the core, Marky is a Web application based on the open source CakePHP framework. User interface relies on HTML5 and CSS3 technologies. Rangy library assists in browser-independent implementation of common DOM range and selection tasks, and Ajax and JQuery technologies are used to enhance user-system interaction.
Marky grants solid management of inter- and intra-annotator work. Most notably, its annotation tracking system supports systematic and on-demand agreement analysis and annotation amendment. Each annotator may work over documents as usual, but all the annotations made are saved by the tracking system and may be further compared. So, the project administrator is able to evaluate annotation consistency among annotators and across rounds of annotation, while annotators are able to reject or amend subsets of annotations made in previous rounds. As a side effect, the tracking system minimises resource and time consumption.
Marky is a novel environment for managing multi-user and iterative document annotation projects. Compared to other tools, Marky offers a similar visually intuitive annotation experience while providing unique means to minimise annotation effort and enforce annotation quality, and therefore corpus consistency. Marky is freely available for non-commercial use at http://sing.ei.uvigo.es/marky.
文档注释是文本挖掘方法与应用开发中的一项关键任务。高质量的注释语料库非常宝贵,但其准备工作需要大量资源和时间。尽管现有注释工具为领域专家提供了良好的用户交互界面,但项目管理和质量控制能力仍然有限。因此,当前工作引入了Marky,这是一种基于网络的新型文档注释工具,能够管理多用户和迭代项目,并在项目生命周期内评估注释质量。
Marky的核心是一个基于开源CakePHP框架的网络应用程序。用户界面依赖于HTML5和CSS3技术。Rangy库有助于在与浏览器无关的情况下实现常见的DOM范围和选择任务,Ajax和JQuery技术用于增强用户与系统的交互。
Marky能够对注释者之间和内部的工作进行可靠管理。最值得注意的是,其注释跟踪系统支持系统的和按需的一致性分析以及注释修正。每个注释者可以像往常一样处理文档,但所做的所有注释都会由跟踪系统保存,并可以进一步比较。因此,项目管理员能够评估注释者之间以及多轮注释之间的注释一致性,而注释者能够拒绝或修正前几轮中所做注释的子集。作为一个附带效果,跟踪系统将资源和时间消耗降至最低。
Marky是管理多用户和迭代文档注释项目的一个新颖环境。与其他工具相比,Marky提供了类似的视觉直观注释体验,同时提供了独特的方法来最小化注释工作量并确保注释质量,从而保证语料库的一致性。Marky可在http://sing.ei.uvigo.es/marky免费用于非商业用途。