Suppr超能文献

我们能用信息提取软件取代编辑工作吗?

Can we replace curation with information extraction software?

作者信息

Karp Peter D

机构信息

Bioinformatics Research Group, SRI, International, 333 Ravenswood Ave, Menlo Park, CA 94025, USA. Tel:650-859-4358; Fax: 650-859-3735; E-mail:

出版信息

Database (Oxford). 2016 Dec 26;2016. doi: 10.1093/database/baw150. Print 2016.

Abstract

Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL.

摘要

我们能否使用程序从科学文本中自动或半自动提取信息,作为专业编目的实用替代方法?我发现,当前信息提取程序的错误率过高,目前无法取代专业编目。此外,当前的信息提取程序只能提取单一狭窄的信息片段,例如单个蛋白质相互作用;它们无法提取专业编目人员为EcoCyc等数据库提取的广泛信息。它们也无法像编目人员那样对文献中相互矛盾的陈述进行仲裁。因此,资助机构不应基于认为一个困扰人工智能研究人员60多年的问题明天就能解决的假设,来阻碍现有数据库的编目工作。基于对近期提高编目人员生产力的工具的回顾,半自动提取技术似乎具有更大的潜力。但目前缺乏对这些工具的全面成本效益分析。没有这样的分析,就有可能花费大量精力开发信息提取工具,这些工具只能自动执行整体编目工作流程中的小部分任务,而无法显著降低编目成本。数据库网址。

相似文献

1
Can we replace curation with information extraction software?我们能用信息提取软件取代编辑工作吗?
Database (Oxford). 2016 Dec 26;2016. doi: 10.1093/database/baw150. Print 2016.
7
Overview of the interactive task in BioCreative V.生物创意V中交互式任务概述。
Database (Oxford). 2016 Sep 1;2016. doi: 10.1093/database/baw119. Print 2016.

引用本文的文献

6
Data Management and Modeling in Plant Biology.植物生物学中的数据管理与建模
Front Plant Sci. 2021 Sep 3;12:717958. doi: 10.3389/fpls.2021.717958. eCollection 2021.

本文引用的文献

1
Perspective: Sustaining the big-data ecosystem.观点:维持大数据生态系统
Nature. 2015 Nov 5;527(7576):S16-7. doi: 10.1038/527S16a.
2
Curation accuracy of model organism databases.模式生物数据库的管理准确性。
Database (Oxford). 2014 Jun 12;2014. doi: 10.1093/database/bau058. Print 2014.
3
Event-based text mining for biology and functional genomics.用于生物学和功能基因组学的基于事件的文本挖掘
Brief Funct Genomics. 2015 May;14(3):213-30. doi: 10.1093/bfgp/elu015. Epub 2014 Jun 6.
4
PubTator: a web-based text mining tool for assisting biocuration.PubTator:一个用于辅助生物注释的基于网络的文本挖掘工具。
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W518-22. doi: 10.1093/nar/gkt441. Epub 2013 May 22.
6
EcoCyc: fusing model organism databases with systems biology.EcoCyc:将模式生物数据库与系统生物学融合。
Nucleic Acids Res. 2013 Jan;41(Database issue):D605-12. doi: 10.1093/nar/gks1027. Epub 2012 Nov 9.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验