• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BioCreative III 交互式任务概述。

BioCreative III interactive task: an overview.

机构信息

Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA.

出版信息

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-12-S8-S4.

DOI:10.1186/1471-2105-12-S8-S4
PMID:22151968
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3269939/
Abstract

BACKGROUND

The BioCreative challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. The biocurator community, as an active user of biomedical literature, provides a diverse and engaged end user group for text mining tools. Earlier BioCreative challenges involved many text mining teams in developing basic capabilities relevant to biological curation, but they did not address the issues of system usage, insertion into the workflow and adoption by curators. Thus in BioCreative III (BC-III), the InterActive Task (IAT) was introduced to address the utility and usability of text mining tools for real-life biocuration tasks. To support the aims of the IAT in BC-III, involvement of both developers and end users was solicited, and the development of a user interface to address the tasks interactively was requested.

RESULTS

A User Advisory Group (UAG) actively participated in the IAT design and assessment. The task focused on gene normalization (identifying gene mentions in the article and linking these genes to standard database identifiers), gene ranking based on the overall importance of each gene mentioned in the article, and gene-oriented document retrieval (identifying full text papers relevant to a selected gene). Six systems participated and all processed and displayed the same set of articles. The articles were selected based on content known to be problematic for curation, such as ambiguity of gene names, coverage of multiple genes and species, or introduction of a new gene name. Members of the UAG curated three articles for training and assessment purposes, and each member was assigned a system to review. A questionnaire related to the interface usability and task performance (as measured by precision and recall) was answered after systems were used to curate articles. Although the limited number of articles analyzed and users involved in the IAT experiment precluded rigorous quantitative analysis of the results, a qualitative analysis provided valuable insight into some of the problems encountered by users when using the systems. The overall assessment indicates that the system usability features appealed to most users, but the system performance was suboptimal (mainly due to low accuracy in gene normalization). Some of the issues included failure of species identification and gene name ambiguity in the gene normalization task leading to an extensive list of gene identifiers to review, which, in some cases, did not contain the relevant genes. The document retrieval suffered from the same shortfalls. The UAG favored achieving high performance (measured by precision and recall), but strongly recommended the addition of features that facilitate the identification of correct gene and its identifier, such as contextual information to assist in disambiguation.

DISCUSSION

The IAT was an informative exercise that advanced the dialog between curators and developers and increased the appreciation of challenges faced by each group. A major conclusion was that the intended users should be actively involved in every phase of software development, and this will be strongly encouraged in future tasks. The IAT Task provides the first steps toward the definition of metrics and functional requirements that are necessary for designing a formal evaluation of interactive curation systems in the BioCreative IV challenge.

摘要

背景

生物创意挑战赛评估是一个面向整个社区的活动,旨在评估应用于生物领域的文本挖掘和信息提取系统。生物注释员社区作为生物医学文献的积极使用者,为文本挖掘工具提供了多样化和参与度高的最终用户群体。早期的生物创意挑战赛涉及许多文本挖掘团队开发与生物注释相关的基本能力,但它们并未解决系统使用、纳入工作流程以及被注释员采用等问题。因此,在生物创意挑战赛 III(BC-III)中,引入了交互式任务(IAT),以解决文本挖掘工具在实际生物注释任务中的实用性和可用性问题。为了支持 IAT 在 BC-III 中的目标,我们征求了开发人员和最终用户的参与,并要求开发一个用户界面来进行交互式任务处理。

结果

用户顾问小组(UAG)积极参与了 IAT 的设计和评估。该任务侧重于基因标准化(确定文章中的基因提及,并将这些基因链接到标准数据库标识符)、基于文章中提及的每个基因的整体重要性的基因排名,以及面向基因的文档检索(确定与选定基因相关的全文论文)。六个系统参与其中,并处理和显示了相同的文章集。这些文章是根据已知对注释有问题的内容选择的,例如基因名称的歧义、多个基因和物种的覆盖范围,或引入新的基因名称。UAG 的成员为培训和评估目的注释了三篇文章,并且为每个成员分配了一个系统进行审查。在使用系统注释文章后,回答了与界面可用性和任务性能(以精度和召回率衡量)相关的问卷。尽管分析的文章数量有限,参与 IAT 实验的用户数量有限,因此无法对结果进行严格的定量分析,但定性分析提供了一些有价值的见解,了解用户在使用系统时遇到的一些问题。总体评估表明,系统可用性功能吸引了大多数用户,但系统性能不理想(主要是由于基因标准化的准确性较低)。一些问题包括物种识别失败和基因名称歧义导致需要审查大量基因标识符,而在某些情况下,这些标识符并不包含相关基因。文档检索也存在同样的缺陷。UAG 赞成实现高性能(以精度和召回率衡量),但强烈建议添加有助于识别正确基因及其标识符的功能,例如上下文信息以协助消歧。

讨论

IAT 是一项有益的练习,它促进了注释员和开发人员之间的对话,并提高了每个群体对所面临挑战的认识。一个主要结论是,预期用户应积极参与软件开发的每个阶段,这将在未来的任务中得到大力鼓励。IAT 任务为定义生物创意挑战赛 IV 中交互式注释系统正式评估所需的指标和功能要求提供了第一步。

相似文献

1
BioCreative III interactive task: an overview.BioCreative III 交互式任务概述。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-12-S8-S4.
2
Overview of the BioCreative III Workshop.第三届生物创意研讨会概述。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-12-S8-S1.
3
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.BioCreative 2012 研讨会第三轨道:交互式文本挖掘任务概述。
Database (Oxford). 2013 Jan 17;2013:bas056. doi: 10.1093/database/bas056. Print 2013.
4
Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge.生物学文本挖掘系统评估:第二届生物创意社区挑战赛概述
Genome Biol. 2008;9 Suppl 2(Suppl 2):S1. doi: 10.1186/gb-2008-9-s2-s1. Epub 2008 Sep 1.
5
Overview of the interactive task in BioCreative V.生物创意V中交互式任务概述。
Database (Oxford). 2016 Sep 1;2016. doi: 10.1093/database/baw119. Print 2016.
6
Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.BioCreative VI 精准医学赛道概述:精准医学中的蛋白质相互作用和突变挖掘。
Database (Oxford). 2019 Jan 1;2019:bay147. doi: 10.1093/database/bay147.
7
The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text.BioCreative III 的蛋白质-蛋白质相互作用任务:文章的分类/排序和将生物本体论概念链接到全文。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S3. doi: 10.1186/1471-2105-12-S8-S3.
8
Overview of the gene ontology task at BioCreative IV.生物创意IV基因本体任务概述。
Database (Oxford). 2014 Aug 25;2014. doi: 10.1093/database/bau086. Print 2014.
9
RLIMS-P: an online text-mining tool for literature-based extraction of protein phosphorylation information.RLIMS-P:一种基于文献提取蛋白质磷酸化信息的在线文本挖掘工具。
Database (Oxford). 2014 Aug 13;2014. doi: 10.1093/database/bau081. Print 2014.
10
Overview of BioCreAtIvE: critical assessment of information extraction for biology.生物创意(BioCreAtIvE)概述:生物学信息提取的批判性评估
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2105-6-S1-S1. Epub 2005 May 24.

引用本文的文献

1
Overview of the COVID-19 text mining tool interactive demonstration track in BioCreative VII.COVID-19 文本挖掘工具交互式演示赛道概述——BioCreative VII
Database (Oxford). 2022 Oct 5;2022. doi: 10.1093/database/baac084.
2
Continuous development of the semantic search engine preVIEW: from COVID-19 to long COVID.语义搜索引擎 preVIEW 的持续发展:从 COVID-19 到长新冠。
Database (Oxford). 2022 Jul 1;2022. doi: 10.1093/database/baac048.
3
Next generation community assessment of biomedical entity recognition web servers: metrics, performance, interoperability aspects of BeCalm.

本文引用的文献

1
The Protein Ontology: a structured representation of protein forms and complexes.蛋白质本体论:蛋白质形式与复合物的结构化表示。
Nucleic Acids Res. 2011 Jan;39(Database issue):D539-45. doi: 10.1093/nar/gkq907. Epub 2010 Oct 8.
2
Utopia documents: linking scholarly literature with research data.乌托邦文献:将学术文献与研究数据联系起来。
Bioinformatics. 2010 Sep 15;26(18):i568-74. doi: 10.1093/bioinformatics/btq383.
3
The FEBS Letters SDA corpus: a collection of protein interaction articles with high quality annotations for the BioCreative II.5 online challenge and the text mining community.
生物医学实体识别网络服务器的下一代社区评估:BeCalm的指标、性能及互操作性方面
J Cheminform. 2019 Jun 24;11(1):42. doi: 10.1186/s13321-019-0363-6.
4
Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.BioCreative VI 精准医学赛道概述:精准医学中的蛋白质相互作用和突变挖掘。
Database (Oxford). 2019 Jan 1;2019:bay147. doi: 10.1093/database/bay147.
5
Identification of conclusive association entities in biomedical articles.生物医学文章中确凿关联实体的识别。
J Biomed Semantics. 2019 Jan 7;10(1):1. doi: 10.1186/s13326-018-0194-9.
6
Annotation and detection of drug effects in text for pharmacovigilance.用于药物警戒的文本中药物效应的标注与检测。
J Cheminform. 2018 Aug 13;10(1):37. doi: 10.1186/s13321-018-0290-y.
7
Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature.Textpresso 中心:一个可定制的平台,用于搜索、文本挖掘、查看和管理生物医学文献。
BMC Bioinformatics. 2018 Mar 9;19(1):94. doi: 10.1186/s12859-018-2103-8.
8
Strategies towards digital and semi-automated curation in RegulonDB.RegulonDB中数字和半自动管理的策略。
Database (Oxford). 2017 Jan 1;2017(1). doi: 10.1093/database/bax012.
9
Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges.生物编目及其他领域对生物医学文本挖掘的迫切需求:机遇与挑战。
Database (Oxford). 2016 Dec 26;2016. doi: 10.1093/database/baw161. Print 2016.
10
ASAP: a machine learning framework for local protein properties.ASAP:一种用于局部蛋白质特性的机器学习框架。
Database (Oxford). 2016 Oct 2;2016. doi: 10.1093/database/baw133. Print 2016.
《欧洲生物化学学会联合会快报》SDA语料库:用于生物创意II.5在线挑战赛及文本挖掘社区的带有高质量注释的蛋白质相互作用文章集合。
FEBS Lett. 2010 Oct 8;584(19):4129-30. doi: 10.1016/j.febslet.2010.08.026. Epub 2010 Aug 20.
4
An Overview of BioCreative II.5.BioCreative II.5 概述。
IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):385-99. doi: 10.1109/tcbb.2010.61.
5
Efficient extraction of protein-protein interactions from full-text articles.从全文文章中高效提取蛋白质-蛋白质相互作用。
IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):481-94. doi: 10.1109/TCBB.2010.51.
6
Identification of a primary target of thalidomide teratogenicity.确定沙利度胺致畸性的主要靶点。
Science. 2010 Mar 12;327(5971):1345-50. doi: 10.1126/science.1177319.
7
Integrating text mining into the MGI biocuration workflow.将文本挖掘整合到MGI生物编目工作流程中。
Database (Oxford). 2009;2009:bap019. doi: 10.1093/database/bap019. Epub 2009 Nov 21.
8
LINNAEUS: a species name identification system for biomedical literature.林奈氏:生物医学文献的物种名称识别系统。
BMC Bioinformatics. 2010 Feb 11;11:85. doi: 10.1186/1471-2105-11-85.
9
Dual localized AtHscB involved in iron sulfur protein biogenesis in Arabidopsis.双定位的 AtHscB 参与拟南芥铁硫蛋白生物发生。
PLoS One. 2009 Oct 29;4(10):e7662. doi: 10.1371/journal.pone.0007662.
10
The Universal Protein Resource (UniProt) in 2010.2010 年的通用蛋白质资源(UniProt)。
Nucleic Acids Res. 2010 Jan;38(Database issue):D142-8. doi: 10.1093/nar/gkp846. Epub 2009 Oct 20.