Suppr超能文献

生物编目及其他领域对生物医学文本挖掘的迫切需求:机遇与挑战。

Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges.

作者信息

Singhal Ayush, Leaman Robert, Catlett Natalie, Lemberger Thomas, McEntyre Johanna, Polson Shawn, Xenarios Ioannis, Arighi Cecilia, Lu Zhiyong

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

Selventa, Cambridge, MA 02140, USA.

出版信息

Database (Oxford). 2016 Dec 26;2016. doi: 10.1093/database/baw161. Print 2016.

Abstract

Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system 'accuracy' remains a challenge and identify several additional common difficulties and potential research directions including (i) the 'scalability' issue due to the increasing need of mining information from millions of full-text articles, (ii) the 'interoperability' issue of integrating various text-mining systems into existing curation workflows and (iii) the 'reusability' issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.

摘要

生物医学领域的文本挖掘正在迅速从小规模评估转向大规模应用。在本文中,我们认为文本挖掘技术已成为现实世界生物医学研究中的重要工具。我们描述了文本挖掘的四个大规模应用,这些应用在最近的生物创意V挑战赛研讨会上的小组讨论中得到了展示。我们将这些应用作为案例研究,以描述将文本挖掘技术成功应用于实际生物编目需求的常见要求。我们注意到系统“准确性”仍然是一个挑战,并确定了几个其他常见困难和潜在研究方向,包括:(i)由于从数百万篇全文文章中挖掘信息的需求不断增加而产生的“可扩展性”问题;(ii)将各种文本挖掘系统集成到现有编目工作流程中的“互操作性”问题;以及(iii)将经过训练的系统应用于开发过程中未曾见过的文本类型时遇到的“可重用性”问题。然后,我们描述了文本挖掘社区内的相关工作,特别关注生物创意系列挑战赛研讨会。我们相信,关注这项工作中确定的近期挑战将扩大持续采用文本挖掘工具所带来的机会。最后,为了维持编目生态系统并使文本挖掘系统因实际效益而被采用,我们呼吁文本挖掘研究人员与包括研究人员、出版商和生物编目人员在内的各种利益相关者加强合作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b279/5199160/4fffbab8d30d/baw161f1p.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验