生物编目及其他领域对生物医学文本挖掘的迫切需求：机遇与挑战。

Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges.

作者信息

Singhal Ayush, Leaman Robert, Catlett Natalie, Lemberger Thomas, McEntyre Johanna, Polson Shawn, Xenarios Ioannis, Arighi Cecilia, Lu Zhiyong

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

Selventa, Cambridge, MA 02140, USA.

出版信息

Database (Oxford). 2016 Dec 26;2016. doi: 10.1093/database/baw161. Print 2016.

DOI:10.1093/database/baw161

PMID:28025348

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5199160/

Abstract

Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system 'accuracy' remains a challenge and identify several additional common difficulties and potential research directions including (i) the 'scalability' issue due to the increasing need of mining information from millions of full-text articles, (ii) the 'interoperability' issue of integrating various text-mining systems into existing curation workflows and (iii) the 'reusability' issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.

摘要

生物医学领域的文本挖掘正在迅速从小规模评估转向大规模应用。在本文中，我们认为文本挖掘技术已成为现实世界生物医学研究中的重要工具。我们描述了文本挖掘的四个大规模应用，这些应用在最近的生物创意V挑战赛研讨会上的小组讨论中得到了展示。我们将这些应用作为案例研究，以描述将文本挖掘技术成功应用于实际生物编目需求的常见要求。我们注意到系统“准确性”仍然是一个挑战，并确定了几个其他常见困难和潜在研究方向，包括：（i）由于从数百万篇全文文章中挖掘信息的需求不断增加而产生的“可扩展性”问题；（ii）将各种文本挖掘系统集成到现有编目工作流程中的“互操作性”问题；以及（iii）将经过训练的系统应用于开发过程中未曾见过的文本类型时遇到的“可重用性”问题。然后，我们描述了文本挖掘社区内的相关工作，特别关注生物创意系列挑战赛研讨会。我们相信，关注这项工作中确定的近期挑战将扩大持续采用文本挖掘工具所带来的机会。最后，为了维持编目生态系统并使文本挖掘系统因实际效益而被采用，我们呼吁文本挖掘研究人员与包括研究人员、出版商和生物编目人员在内的各种利益相关者加强合作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b279/5199160/4fffbab8d30d/baw161f1p.jpg

相似文献

Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges.生物编目及其他领域对生物医学文本挖掘的迫切需求：机遇与挑战。

Database (Oxford). 2016 Dec 26;2016. doi: 10.1093/database/baw161. Print 2016.

Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.生物信息学工作流程和文本挖掘：BioCreative 2012 研讨会第二轨道概述。

Database (Oxford). 2012 Nov 17;2012:bas043. doi: 10.1093/database/bas043. Print 2012.

Text mining for the biocuration workflow.文本挖掘在生物注释工作流中的应用。

Database (Oxford). 2012 Apr 18;2012:bas020. doi: 10.1093/database/bas020. Print 2012.

An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.BioCreative 2012 研讨会第三轨道：交互式文本挖掘任务概述。

Database (Oxford). 2013 Jan 17;2013:bas056. doi: 10.1093/database/bas056. Print 2013.

Text-mining-assisted biocuration workflows in Argo.阿尔戈中基于文本挖掘的生物编目工作流程。

Database (Oxford). 2014 Jul 18;2014. doi: 10.1093/database/bau070. Print 2014.

Overview of the BioCreative III Workshop.第三届生物创意研讨会概述。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-12-S8-S1.

BioCreative III interactive task: an overview.BioCreative III 交互式任务概述。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-12-S8-S4.

Overview of the interactive task in BioCreative V.生物创意V中交互式任务概述。

Database (Oxford). 2016 Sep 1;2016. doi: 10.1093/database/baw119. Print 2016.

Biocuration - mapping resources and needs.生物信息学资源和需求的映射

F1000Res. 2020 Sep 4;9. doi: 10.12688/f1000research.25413.2. eCollection 2020.

PubTator: a web-based text mining tool for assisting biocuration.PubTator：一个用于辅助生物注释的基于网络的文本挖掘工具。

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W518-22. doi: 10.1093/nar/gkt441. Epub 2013 May 22.

引用本文的文献

Leveraging generative AI to assist biocuration of medical actions for rare disease.利用生成式人工智能辅助罕见病医疗行为的生物编目。

Bioinform Adv. 2025 Jun 12;5(1):vbaf141. doi: 10.1093/bioadv/vbaf141. eCollection 2025.

SAND: a comprehensive annotation of class D β-lactamases using structural alignment-based numbering.SAND：使用基于结构比对编号对D类β-内酰胺酶进行全面注释。

Antimicrob Agents Chemother. 2025 Jul 2;69(7):e0015025. doi: 10.1128/aac.00150-25. Epub 2025 May 27.

Leveraging Generative AI to Accelerate Biocuration of Medical Actions for Rare Disease.利用生成式人工智能加速罕见病医疗行为的生物编目。

medRxiv. 2024 Aug 22:2024.08.22.24310814. doi: 10.1101/2024.08.22.24310814.

ProMENDA: an updated resource for proteomic and metabolomic characterization in depression.ProMENDA：抑郁症中蛋白质组学和代谢组学特征的更新资源。

Transl Psychiatry. 2024 May 30;14(1):229. doi: 10.1038/s41398-024-02948-2.

Large language model based framework for automated extraction of genetic interactions from unstructured data.基于大型语言模型的框架，用于从非结构化数据中自动提取遗传相互作用。

PLoS One. 2024 May 21;19(5):e0303231. doi: 10.1371/journal.pone.0303231. eCollection 2024.

Toward reporting standards for the pathogenicity of variant combinations involved in multilocus/oligogenic diseases.关于多基因/寡基因疾病中涉及的变异组合致病性报告标准的建议。

HGG Adv. 2022 Dec 2;4(1):100165. doi: 10.1016/j.xhgg.2022.100165. eCollection 2023 Jan 12.

Expediting knowledge acquisition by a web framework for Knowledge Graph Exploration and Visualization (KGEV): case studies on COVID-19 and Human Phenotype Ontology.利用知识图谱探索和可视化（KGEV）的网络框架加速知识获取：COVID-19 和人类表型本体论的案例研究。

BMC Med Inform Decis Mak. 2022 Jun 2;22(Suppl 2):147. doi: 10.1186/s12911-022-01848-z.

A Computational Text Mining-Guided Meta-Analysis Approach to Identify Potential Xerostomia Drug Targets.一种基于计算文本挖掘的元分析方法来识别潜在的口干症药物靶点。

J Clin Med. 2022 Mar 5;11(5):1442. doi: 10.3390/jcm11051442.

Text Mining for Building Biomedical Networks Using Cancer as a Case Study.基于癌症案例研究的生物医学网络构建的文本挖掘。

Biomolecules. 2021 Sep 29;11(10):1430. doi: 10.3390/biom11101430.

ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts.ECO-CollecTF：生物医学手稿中带注释的循证断言语料库。

Front Res Metr Anal. 2021 Jul 13;6:674205. doi: 10.3389/frma.2021.674205. eCollection 2021.

本文引用的文献

SourceData: a semantic platform for curating and searching figures.源数据：一个用于整理和搜索图表的语义平台。

Nat Methods. 2017 Oct 31;14(11):1021-1022. doi: 10.1038/nmeth.4471.

Overview of the interactive task in BioCreative V.生物创意V中交互式任务概述。

Database (Oxford). 2016 Sep 1;2016. doi: 10.1093/database/baw119. Print 2016.

TaggerOne: joint named entity recognition and normalization with semi-Markov Models.TaggerOne：使用半马尔可夫模型进行联合命名实体识别与归一化

Bioinformatics. 2016 Sep 15;32(18):2839-46. doi: 10.1093/bioinformatics/btw343. Epub 2016 Jun 9.

EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation.摘要：用于宏基因组样本注释的环境元数据交互式提取和术语建议

Database (Oxford). 2016 Feb 20;2016. doi: 10.1093/database/baw005. Print 2016.

Beyond accuracy: creating interoperable and scalable text-mining web services.超越准确性：创建可互操作且可扩展的文本挖掘网络服务。

Bioinformatics. 2016 Jun 15;32(12):1907-10. doi: 10.1093/bioinformatics/btv760. Epub 2016 Feb 16.

Temporal variation selects for diet-microbe co-metabolic traits in the gut of Gorilla spp.时间变化选择了大猩猩肠道中饮食与微生物的共代谢特征。

ISME J. 2016 Feb;10(2):532. doi: 10.1038/ismej.2015.252.

The Importance of Biological Databases in Biological Discovery.生物数据库在生物发现中的重要性。

Curr Protoc Bioinformatics. 2015 Jun 19;50:1.1.1-1.1.8. doi: 10.1002/0471250953.bi0101s50.

Control of brain development, function, and behavior by the microbiome.微生物群对大脑发育、功能和行为的调控。

Cell Host Microbe. 2015 May 13;17(5):565-76. doi: 10.1016/j.chom.2015.04.011.

Community challenges in biomedical text mining over 10 years: success, failure and the future.十年来生物医学文本挖掘中的社区挑战：成功、失败与未来。

Brief Bioinform. 2016 Jan;17(1):132-44. doi: 10.1093/bib/bbv024. Epub 2015 May 1.

Causal biological network database: a comprehensive platform of causal biological network models focused on the pulmonary and vascular systems.因果生物网络数据库：一个专注于肺和血管系统的因果生物网络模型综合平台。

Database (Oxford). 2015 Apr 17;2015:bav030. doi: 10.1093/database/bav030. Print 2015.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

生物编目及其他领域对生物医学文本挖掘的迫切需求：机遇与挑战。

Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献