Bekhuis Tanja
Department of Library & Information Science, School of Information Sciences, University of Pittsburgh, 135 North Bellefield Avenue, Pittsburgh, PA 15260, USA.
Biomed Digit Libr. 2006 Apr 3;3:2. doi: 10.1186/1742-5581-3-2.
Innovative biomedical librarians and information specialists who want to expand their roles as expert searchers need to know about profound changes in biology and parallel trends in text mining. In recent years, conceptual biology has emerged as a complement to empirical biology. This is partly in response to the availability of massive digital resources such as the network of databases for molecular biologists at the National Center for Biotechnology Information. Developments in text mining and hypothesis discovery systems based on the early work of Swanson, a mathematician and information scientist, are coincident with the emergence of conceptual biology. Very little has been written to introduce biomedical digital librarians to these new trends. In this paper, background for data and text mining, as well as for knowledge discovery in databases (KDD) and in text (KDT) is presented, then a brief review of Swanson's ideas, followed by a discussion of recent approaches to hypothesis discovery and testing. 'Testing' in the context of text mining involves partially automated methods for finding evidence in the literature to support hypothetical relationships. Concluding remarks follow regarding (a) the limits of current strategies for evaluation of hypothesis discovery systems and (b) the role of literature-based discovery in concert with empirical research. Report of an informatics-driven literature review for biomarkers of systemic lupus erythematosus is mentioned. Swanson's vision of the hidden value in the literature of science and, by extension, in biomedical digital databases, is still remarkably generative for information scientists, biologists, and physicians.
想要拓展其作为专家搜索者角色的创新型生物医学图书馆员和信息专家,需要了解生物学领域的深刻变革以及文本挖掘的并行趋势。近年来,概念生物学已成为实证生物学的补充。这部分是对海量数字资源可得性的回应,比如美国国立生物技术信息中心为分子生物学家提供的数据库网络。基于数学家兼信息科学家斯旺森早期工作的文本挖掘和假设发现系统的发展,与概念生物学的出现同时发生。针对向生物医学数字图书馆员介绍这些新趋势的文章却很少。本文介绍了数据和文本挖掘以及数据库知识发现(KDD)和文本知识发现(KDT)的背景,接着简要回顾了斯旺森的观点,随后讨论了假设发现和检验的近期方法。文本挖掘背景下的“检验”涉及在文献中寻找证据以支持假设关系的部分自动化方法。最后得出关于(a)当前假设发现系统评估策略的局限性以及(b)基于文献的发现与实证研究协同作用的结论。文中提到了一项由信息学驱动的关于系统性红斑狼疮生物标志物的文献综述报告。斯旺森对科学文献以及由此延伸至生物医学数字数据库中隐藏价值的见解,对信息科学家、生物学家和医生而言,仍然具有显著的启发性。