Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA.
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-12-S8-S4.
The BioCreative challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. The biocurator community, as an active user of biomedical literature, provides a diverse and engaged end user group for text mining tools. Earlier BioCreative challenges involved many text mining teams in developing basic capabilities relevant to biological curation, but they did not address the issues of system usage, insertion into the workflow and adoption by curators. Thus in BioCreative III (BC-III), the InterActive Task (IAT) was introduced to address the utility and usability of text mining tools for real-life biocuration tasks. To support the aims of the IAT in BC-III, involvement of both developers and end users was solicited, and the development of a user interface to address the tasks interactively was requested.
A User Advisory Group (UAG) actively participated in the IAT design and assessment. The task focused on gene normalization (identifying gene mentions in the article and linking these genes to standard database identifiers), gene ranking based on the overall importance of each gene mentioned in the article, and gene-oriented document retrieval (identifying full text papers relevant to a selected gene). Six systems participated and all processed and displayed the same set of articles. The articles were selected based on content known to be problematic for curation, such as ambiguity of gene names, coverage of multiple genes and species, or introduction of a new gene name. Members of the UAG curated three articles for training and assessment purposes, and each member was assigned a system to review. A questionnaire related to the interface usability and task performance (as measured by precision and recall) was answered after systems were used to curate articles. Although the limited number of articles analyzed and users involved in the IAT experiment precluded rigorous quantitative analysis of the results, a qualitative analysis provided valuable insight into some of the problems encountered by users when using the systems. The overall assessment indicates that the system usability features appealed to most users, but the system performance was suboptimal (mainly due to low accuracy in gene normalization). Some of the issues included failure of species identification and gene name ambiguity in the gene normalization task leading to an extensive list of gene identifiers to review, which, in some cases, did not contain the relevant genes. The document retrieval suffered from the same shortfalls. The UAG favored achieving high performance (measured by precision and recall), but strongly recommended the addition of features that facilitate the identification of correct gene and its identifier, such as contextual information to assist in disambiguation.
The IAT was an informative exercise that advanced the dialog between curators and developers and increased the appreciation of challenges faced by each group. A major conclusion was that the intended users should be actively involved in every phase of software development, and this will be strongly encouraged in future tasks. The IAT Task provides the first steps toward the definition of metrics and functional requirements that are necessary for designing a formal evaluation of interactive curation systems in the BioCreative IV challenge.
生物创意挑战赛评估是一个面向整个社区的活动,旨在评估应用于生物领域的文本挖掘和信息提取系统。生物注释员社区作为生物医学文献的积极使用者,为文本挖掘工具提供了多样化和参与度高的最终用户群体。早期的生物创意挑战赛涉及许多文本挖掘团队开发与生物注释相关的基本能力,但它们并未解决系统使用、纳入工作流程以及被注释员采用等问题。因此,在生物创意挑战赛 III(BC-III)中,引入了交互式任务(IAT),以解决文本挖掘工具在实际生物注释任务中的实用性和可用性问题。为了支持 IAT 在 BC-III 中的目标,我们征求了开发人员和最终用户的参与,并要求开发一个用户界面来进行交互式任务处理。
用户顾问小组(UAG)积极参与了 IAT 的设计和评估。该任务侧重于基因标准化(确定文章中的基因提及,并将这些基因链接到标准数据库标识符)、基于文章中提及的每个基因的整体重要性的基因排名,以及面向基因的文档检索(确定与选定基因相关的全文论文)。六个系统参与其中,并处理和显示了相同的文章集。这些文章是根据已知对注释有问题的内容选择的,例如基因名称的歧义、多个基因和物种的覆盖范围,或引入新的基因名称。UAG 的成员为培训和评估目的注释了三篇文章,并且为每个成员分配了一个系统进行审查。在使用系统注释文章后,回答了与界面可用性和任务性能(以精度和召回率衡量)相关的问卷。尽管分析的文章数量有限,参与 IAT 实验的用户数量有限,因此无法对结果进行严格的定量分析,但定性分析提供了一些有价值的见解,了解用户在使用系统时遇到的一些问题。总体评估表明,系统可用性功能吸引了大多数用户,但系统性能不理想(主要是由于基因标准化的准确性较低)。一些问题包括物种识别失败和基因名称歧义导致需要审查大量基因标识符,而在某些情况下,这些标识符并不包含相关基因。文档检索也存在同样的缺陷。UAG 赞成实现高性能(以精度和召回率衡量),但强烈建议添加有助于识别正确基因及其标识符的功能,例如上下文信息以协助消歧。
IAT 是一项有益的练习,它促进了注释员和开发人员之间的对话,并提高了每个群体对所面临挑战的认识。一个主要结论是,预期用户应积极参与软件开发的每个阶段,这将在未来的任务中得到大力鼓励。IAT 任务为定义生物创意挑战赛 IV 中交互式注释系统正式评估所需的指标和功能要求提供了第一步。