• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BioCreative 2012 研讨会第三轨道:交互式文本挖掘任务概述。

An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.

机构信息

Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA.

出版信息

Database (Oxford). 2013 Jan 17;2013:bas056. doi: 10.1093/database/bas056. Print 2013.

DOI:10.1093/database/bas056
PMID:23327936
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3625048/
Abstract

In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.

摘要

在许多数据库中,生物注释主要涉及文献注释,通常包括检索相关文章、提取可转化为注释的信息以及识别新的文献。随着生物文献量的增加,使用文本挖掘来辅助生物注释变得越来越重要。许多团队从计算机科学/语言学的角度开发了文本挖掘工具,并且有许多倡议从文献中注释生物学的某个方面。一些生物注释工作已经使用了文本挖掘工具,但很少有基于广泛的系统努力来研究文本挖掘工具的哪些方面有助于其完成注释任务。在这里,我们报告了一项努力,即将文本挖掘工具开发人员和数据库生物注释人员聚集在一起,以测试工具的实用性和可用性。六个呈现不同生物注释任务的文本挖掘系统参与了正式评估,并招募了适当的生物注释人员进行测试。该评估的性能结果表明,一些系统能够通过显著提高注释效率(手动注释的约 1.7-2.5 倍)来加速注释任务。此外,与手动注释集的性能相比,一些系统能够提高注释准确性。在注释者之间的一致性方面,一些系统产生显著差异的因素包括注释人员在给定注释任务上的专业知识、注释的固有难度和对注释指南的关注。任务完成后,注释人员被要求完成一项调查,以帮助识别各种系统的优缺点。对该调查的分析突出了任务完成对生物注释人员对系统整体体验的重要性,而不论系统在设计、可学习性和可用性方面的得分如何。此外,还分析了细化注释指南和系统文档、使工具适应最终用户可能具有的需求和查询类型以及根据效率、用户界面、结果导出和传统评估指标评估性能的策略。这一分析将有助于在 BioCreative IV 中进行更深入的研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc8/3625048/74ec13ef8ea4/bas056f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc8/3625048/74ec13ef8ea4/bas056f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2dc8/3625048/74ec13ef8ea4/bas056f1p.jpg

相似文献

1
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.BioCreative 2012 研讨会第三轨道:交互式文本挖掘任务概述。
Database (Oxford). 2013 Jan 17;2013:bas056. doi: 10.1093/database/bas056. Print 2013.
2
BioCreative III interactive task: an overview.BioCreative III 交互式任务概述。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-12-S8-S4.
3
Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.生物信息学工作流程和文本挖掘:BioCreative 2012 研讨会第二轨道概述。
Database (Oxford). 2012 Nov 17;2012:bas043. doi: 10.1093/database/bas043. Print 2012.
4
Overview of the interactive task in BioCreative V.生物创意V中交互式任务概述。
Database (Oxford). 2016 Sep 1;2016. doi: 10.1093/database/baw119. Print 2016.
5
Overview of the BioCreative III Workshop.第三届生物创意研讨会概述。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-12-S8-S1.
6
Text-mining-assisted biocuration workflows in Argo.阿尔戈中基于文本挖掘的生物编目工作流程。
Database (Oxford). 2014 Jul 18;2014. doi: 10.1093/database/bau070. Print 2014.
7
Text mining for the biocuration workflow.文本挖掘在生物注释工作流中的应用。
Database (Oxford). 2012 Apr 18;2012:bas020. doi: 10.1093/database/bas020. Print 2012.
8
Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.BioCreative VI 精准医学赛道概述:精准医学中的蛋白质相互作用和突变挖掘。
Database (Oxford). 2019 Jan 1;2019:bay147. doi: 10.1093/database/bay147.
9
Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts.利用文本挖掘工具加速文献整理:以 PubTator 在 PubMed 摘要中整理基因为例。
Database (Oxford). 2012 Nov 17;2012:bas041. doi: 10.1093/database/bas041. Print 2012.
10
Overview of the gene ontology task at BioCreative IV.生物创意IV基因本体任务概述。
Database (Oxford). 2014 Aug 25;2014. doi: 10.1093/database/bau086. Print 2014.

引用本文的文献

1
CafeteriaSA corpus: scientific abstracts annotated across different food semantic resources.自助餐厅 SA 语料库:在不同的食物语义资源中进行标注的科学摘要。
Database (Oxford). 2022 Dec 16;2022. doi: 10.1093/database/baac107.
2
Overview of the COVID-19 text mining tool interactive demonstration track in BioCreative VII.COVID-19 文本挖掘工具交互式演示赛道概述——BioCreative VII
Database (Oxford). 2022 Oct 5;2022. doi: 10.1093/database/baac084.
3
A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.

本文引用的文献

1
T-HOD: a literature-based candidate gene database for hypertension, obesity and diabetes.T-HOD:一个基于文献的高血压、肥胖和糖尿病候选基因数据库。
Database (Oxford). 2013 Feb 12;2013:bas061. doi: 10.1093/database/bas061. Print 2013.
2
PPInterFinder--a mining tool for extracting causal relations on human proteins from literature.PPInterFinder——一种从文献中提取人类蛋白质因果关系的挖掘工具。
Database (Oxford). 2013 Jan 15;2013:bas052. doi: 10.1093/database/bas052. Print 2013.
3
The eFIP system for text mining of protein interaction networks of phosphorylated proteins.
基于 Transformer 的双向编码器表示模型的精细调整在食品命名实体识别中的应用:算法开发与验证。
J Med Internet Res. 2021 Aug 9;23(8):e28229. doi: 10.2196/28229.
4
WTO, an ontology for wheat traits and phenotypes in scientific publications.WTO,科学出版物中小麦性状和表型的本体论。
Genomics Inform. 2020 Jun;18(2):e14. doi: 10.5808/GI.2020.18.2.e14. Epub 2020 Jun 16.
5
Data libraries - the missing element for modeling biological systems.数据资源库——生物系统建模缺失的一环。
FEBS J. 2020 Nov;287(21):4594-4601. doi: 10.1111/febs.15261. Epub 2020 Mar 10.
6
PubMed Text Similarity Model and its application to curation efforts in the Conserved Domain Database.PubMed 文本相似度模型及其在保守域数据库编目工作中的应用。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz064.
7
COPIOUS: A gold standard corpus of named entities towards extracting species occurrence from biodiversity literature.COPIOUS:一个用于从生物多样性文献中提取物种出现信息的命名实体黄金标准语料库。
Biodivers Data J. 2019 Jan 22(7):e29626. doi: 10.3897/BDJ.7.e29626. eCollection 2019.
8
Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.BioCreative VI 精准医学赛道概述:精准医学中的蛋白质相互作用和突变挖掘。
Database (Oxford). 2019 Jan 1;2019:bay147. doi: 10.1093/database/bay147.
9
Harnessing formal concepts of biological mechanism to analyze human disease.利用生物学机制的形式概念来分析人类疾病。
PLoS Comput Biol. 2018 Dec 26;14(12):e1006540. doi: 10.1371/journal.pcbi.1006540. eCollection 2018 Dec.
10
Configurable web-services for biomedical document annotation.用于生物医学文档注释的可配置网络服务。
J Cheminform. 2018 Dec 21;10(1):68. doi: 10.1186/s13321-018-0317-4.
基于磷酸化蛋白质相互作用网络的文本挖掘的 eFIP 系统。
Database (Oxford). 2012 Dec 5;2012:bas044. doi: 10.1093/database/bas044. Print 2012.
4
Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts.利用文本挖掘工具加速文献整理:以 PubTator 在 PubMed 摘要中整理基因为例。
Database (Oxford). 2012 Nov 17;2012:bas041. doi: 10.1093/database/bas041. Print 2012.
5
Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR.生物注释工作流程中的文本挖掘:在 WormBase、dictyBase 和 TAIR 中进行文献注释的应用。
Database (Oxford). 2012 Nov 17;2012:bas040. doi: 10.1093/database/bas040. Print 2012.
6
SR4GN: a species recognition software tool for gene normalization.SR4GN:一种用于基因标准化的物种识别软件工具。
PLoS One. 2012;7(6):e38460. doi: 10.1371/journal.pone.0038460. Epub 2012 Jun 5.
7
Open semantic annotation of scientific publications using DOMEO.使用DOMEO对科学出版物进行开放语义标注。
J Biomed Semantics. 2012 Apr 24;3 Suppl 1(Suppl 1):S1. doi: 10.1186/2041-1480-3-S1-S1.
8
Using ODIN for a PharmGKB revalidation experiment.使用 ODIN 进行 PharmGKB 再验证实验。
Database (Oxford). 2012 Apr 23;2012:bas021. doi: 10.1093/database/bas021. Print 2012.
9
Text mining for the biocuration workflow.文本挖掘在生物注释工作流中的应用。
Database (Oxford). 2012 Apr 18;2012:bas020. doi: 10.1093/database/bas020. Print 2012.
10
Biocurators and biocuration: surveying the 21st century challenges.生物注释员和生物注释:调查 21 世纪的挑战。
Database (Oxford). 2012 Mar 20;2012:bar059. doi: 10.1093/database/bar059. Print 2012.