de Bruijn Berry, Martin Joel
Institute for Information Technology, National Research Council, Montreal Road Bldg M50, Ottawa, Ont, Canada K1A 0R6.
Int J Med Inform. 2002 Dec 4;67(1-3):7-18. doi: 10.1016/s1386-5056(02)00050-3.
Literature mining is the process of extracting and combining facts from scientific publications. In recent years, many computer programs have been designed to extract various molecular biology findings from Medline abstracts or full-text articles. The present article describes the range of text mining techniques that have been applied to scientific documents. It divides 'automated reading' into four general subtasks: text categorization, named entity tagging, fact extraction, and collection-wide analysis. Literature mining offers powerful methods to support knowledge discovery and the construction of topic maps and ontologies. An overview is given of recent developments in medical language processing. Special attention is given to the domain particularities of molecular biology, and the emerging synergy between literature mining and molecular databases accessible through Internet.
文献挖掘是从科学出版物中提取并整合事实的过程。近年来,人们设计了许多计算机程序,用于从Medline摘要或全文文章中提取各种分子生物学研究结果。本文介绍了已应用于科学文献的文本挖掘技术的范围。它将“自动阅读”分为四个一般子任务:文本分类、命名实体标记、事实提取和全库分析。文献挖掘提供了强大的方法来支持知识发现以及主题地图和本体的构建。本文概述了医学语言处理的最新进展。特别关注了分子生物学的领域特殊性,以及文献挖掘与可通过互联网访问的分子数据库之间新出现的协同作用。