• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

我们能用信息提取软件取代编辑工作吗?

Can we replace curation with information extraction software?

作者信息

Karp Peter D

机构信息

Bioinformatics Research Group, SRI, International, 333 Ravenswood Ave, Menlo Park, CA 94025, USA. Tel:650-859-4358; Fax: 650-859-3735; E-mail:

出版信息

Database (Oxford). 2016 Dec 26;2016. doi: 10.1093/database/baw150. Print 2016.

DOI:10.1093/database/baw150
PMID:28025341
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5199131/
Abstract

Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL.

摘要

我们能否使用程序从科学文本中自动或半自动提取信息,作为专业编目的实用替代方法?我发现,当前信息提取程序的错误率过高,目前无法取代专业编目。此外,当前的信息提取程序只能提取单一狭窄的信息片段,例如单个蛋白质相互作用;它们无法提取专业编目人员为EcoCyc等数据库提取的广泛信息。它们也无法像编目人员那样对文献中相互矛盾的陈述进行仲裁。因此,资助机构不应基于认为一个困扰人工智能研究人员60多年的问题明天就能解决的假设,来阻碍现有数据库的编目工作。基于对近期提高编目人员生产力的工具的回顾,半自动提取技术似乎具有更大的潜力。但目前缺乏对这些工具的全面成本效益分析。没有这样的分析,就有可能花费大量精力开发信息提取工具,这些工具只能自动执行整体编目工作流程中的小部分任务,而无法显著降低编目成本。数据库网址。

相似文献

1
Can we replace curation with information extraction software?我们能用信息提取软件取代编辑工作吗?
Database (Oxford). 2016 Dec 26;2016. doi: 10.1093/database/baw150. Print 2016.
2
Construction of biological networks from unstructured information based on a semi-automated curation workflow.基于半自动编目工作流程从非结构化信息构建生物网络。
Database (Oxford). 2015 Jun 17;2015:bav057. doi: 10.1093/database/bav057.
3
Accelerating annotation of articles via automated approaches: evaluation of the neXtA5 curation-support tool by neXtProt.通过自动化方法加速文章注释:NextProt 对 neXtA5 内容管理支持工具的评估。
Database (Oxford). 2018 Jan 1;2018:bay129. doi: 10.1093/database/bay129.
4
Strategies towards digital and semi-automated curation in RegulonDB.RegulonDB中数字和半自动管理的策略。
Database (Oxford). 2017 Jan 1;2017(1). doi: 10.1093/database/bax012.
5
Assisting manual literature curation for protein-protein interactions using BioQRator.使用BioQRator辅助蛋白质-蛋白质相互作用的手动文献编目。
Database (Oxford). 2014 Jul 22;2014. doi: 10.1093/database/bau067. Print 2014.
6
Text mining meets community curation: a newly designed curation platform to improve author experience and participation at WormBase.文本挖掘与社区策展相结合:一个新设计的策展平台,旨在改善 WormBase 的作者体验和参与度。
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa006.
7
Overview of the interactive task in BioCreative V.生物创意V中交互式任务概述。
Database (Oxford). 2016 Sep 1;2016. doi: 10.1093/database/baw119. Print 2016.
8
BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID.生物创意V生物C轨迹概述:生物网格的协作生物编目员助手任务。
Database (Oxford). 2016 Sep 1;2016. doi: 10.1093/database/baw121. Print 2016.
9
The BioC-BioGRID corpus: full text articles annotated for curation of protein-protein and genetic interactions.BioC-BioGRID语料库:为蛋白质-蛋白质和基因相互作用的编目而注释的全文文章。
Database (Oxford). 2017 Jan 10;2017. doi: 10.1093/database/baw147. Print 2017.
10
LiverCancerMarkerRIF: a liver cancer biomarker interactive curation system combining text mining and expert annotations.肝癌标志物资源整合框架(LiverCancerMarkerRIF):一种结合文本挖掘和专家注释的肝癌生物标志物交互式管理系统。
Database (Oxford). 2014 Aug 27;2014. doi: 10.1093/database/bau085. Print 2014.

引用本文的文献

1
An improved dataset of force fields, electronic and physicochemical descriptors of metabolic substrates.代谢底物的力场、电子和物理化学描述符的改进数据集。
Sci Data. 2024 Aug 27;11(1):929. doi: 10.1038/s41597-024-03707-0.
2
MetaSpot: A General Approach for Recognizing the Reactive Atoms Undergoing Metabolic Reactions Based on the MetaQSAR Database.MetaSpot:一种基于 MetaQSAR 数据库识别发生代谢反应的反应原子的通用方法。
Int J Mol Sci. 2023 Jul 4;24(13):11064. doi: 10.3390/ijms241311064.
3
BLAB2CancerKD: a knowledge graph database focusing on the association between lactic acid bacteria and cancer, but beyond.BLAB2CancerKD:一个专注于乳酸菌与癌症之间关联的知识图谱数据库,但不止于此。
Database (Oxford). 2023 May 23;2023. doi: 10.1093/database/baad036.
4
New reasons for biologists to write with a formal language.生物学家使用正式语言写作的新理由。
Database (Oxford). 2022 Jun 3;2022. doi: 10.1093/database/baac039.
5
MetaClass, a Comprehensive Classification System for Predicting the Occurrence of Metabolic Reactions Based on the MetaQSAR Database.基于 MetaQSAR 数据库的代谢反应发生预测的综合分类系统 MetaClass。
Molecules. 2021 Sep 27;26(19):5857. doi: 10.3390/molecules26195857.
6
Data Management and Modeling in Plant Biology.植物生物学中的数据管理与建模
Front Plant Sci. 2021 Sep 3;12:717958. doi: 10.3389/fpls.2021.717958. eCollection 2021.
7
MetaTREE, a Novel Database Focused on Metabolic Trees, Predicts an Important Detoxification Mechanism: The Glutathione Conjugation.MetaTREE,一个专注于代谢树的新型数据库,预测了一种重要的解毒机制:谷胱甘肽结合。
Molecules. 2021 Apr 6;26(7):2098. doi: 10.3390/molecules26072098.
8
ThermoScan: Semi-automatic Identification of Protein Stability Data From PubMed.ThermoScan:从PubMed中半自动识别蛋白质稳定性数据
Front Mol Biosci. 2021 Mar 25;8:620475. doi: 10.3389/fmolb.2021.620475. eCollection 2021.
9
Leveraging Curation Among Pathway/Genome Databases Using Ortholog-Based Annotation Propagation.利用基于直系同源物的注释传播在通路/基因组数据库之间进行数据策展。
Front Microbiol. 2021 Mar 8;12:614355. doi: 10.3389/fmicb.2021.614355. eCollection 2021.
10
Metabolic networks of the Nicotiana genus in the spotlight: content, progress and outlook.烟草原生质体代谢网络的研究进展:内容、进展和展望。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa136.

本文引用的文献

1
Perspective: Sustaining the big-data ecosystem.观点:维持大数据生态系统
Nature. 2015 Nov 5;527(7576):S16-7. doi: 10.1038/527S16a.
2
Curation accuracy of model organism databases.模式生物数据库的管理准确性。
Database (Oxford). 2014 Jun 12;2014. doi: 10.1093/database/bau058. Print 2014.
3
Event-based text mining for biology and functional genomics.用于生物学和功能基因组学的基于事件的文本挖掘
Brief Funct Genomics. 2015 May;14(3):213-30. doi: 10.1093/bfgp/elu015. Epub 2014 Jun 6.
4
PubTator: a web-based text mining tool for assisting biocuration.PubTator:一个用于辅助生物注释的基于网络的文本挖掘工具。
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W518-22. doi: 10.1093/nar/gkt441. Epub 2013 May 22.
5
Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR.生物注释工作流程中的文本挖掘:在 WormBase、dictyBase 和 TAIR 中进行文献注释的应用。
Database (Oxford). 2012 Nov 17;2012:bas040. doi: 10.1093/database/bas040. Print 2012.
6
EcoCyc: fusing model organism databases with systems biology.EcoCyc:将模式生物数据库与系统生物学融合。
Nucleic Acids Res. 2013 Jan;41(Database issue):D605-12. doi: 10.1093/nar/gks1027. Epub 2012 Nov 9.
7
Automatic categorization of diverse experimental information in the bioscience literature.生物科学文献中多样化实验信息的自动分类。
BMC Bioinformatics. 2012 Jan 26;13:16. doi: 10.1186/1471-2105-13-16.
8
Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation.蛋白质亚细胞定位的半自动管理:一种基于文本挖掘的基因本体论(GO)细胞组分管理方法。
BMC Bioinformatics. 2009 Jul 21;10:228. doi: 10.1186/1471-2105-10-228.