Baumgartner William A, Lu Zhiyong, Johnson Helen L, Caporaso J Gregory, Paquette Jesse, Lindemann Anna, White Elizabeth K, Medvedeva Olga, Cohen K Bretonnel, Hunter Lawrence
Center for Computational Pharmacology, University of Colorado School of Medicine, Aurora, Colorado 80045, USA.
Genome Biol. 2008;9 Suppl 2(Suppl 2):S9. doi: 10.1186/gb-2008-9-s2-s9. Epub 2008 Sep 1.
Reliable information extraction applications have been a long sought goal of the biomedical text mining community, a goal that if reached would provide valuable tools to benchside biologists in their increasingly difficult task of assimilating the knowledge contained in the biomedical literature. We present an integrated approach to concept recognition in biomedical text. Concept recognition provides key information that has been largely missing from previous biomedical information extraction efforts, namely direct links to well defined knowledge resources that explicitly cement the concept's semantics. The BioCreative II tasks discussed in this special issue have provided a unique opportunity to demonstrate the effectiveness of concept recognition in the field of biomedical language processing.
Through the modular construction of a protein interaction relation extraction system, we present several use cases of concept recognition in biomedical text, and relate these use cases to potential uses by the benchside biologist.
Current information extraction technologies are approaching performance standards at which concept recognition can begin to deliver high quality data to the benchside biologist. Our system is available as part of the BioCreative Meta-Server project and on the internet http://bionlp.sourceforge.net.
可靠的信息提取应用一直是生物医学文本挖掘领域长期追求的目标,这一目标若能实现,将为实验台旁的生物学家提供有价值的工具,帮助他们完成吸收生物医学文献中知识这一日益艰巨的任务。我们提出了一种用于生物医学文本中概念识别的综合方法。概念识别提供了关键信息,而这些信息在以往的生物医学信息提取工作中大多缺失,即与明确巩固概念语义的定义良好的知识资源的直接链接。本期特刊中讨论的生物创意II任务提供了一个独特的机会,来证明概念识别在生物医学语言处理领域的有效性。
通过蛋白质相互作用关系提取系统的模块化构建,我们展示了生物医学文本中概念识别的几个用例,并将这些用例与实验台旁生物学家的潜在用途联系起来。
当前的信息提取技术正在接近性能标准,在这个标准下概念识别能够开始为实验台旁的生物学家提供高质量的数据。我们的系统作为生物创意元服务器项目的一部分可在互联网http://bionlp.sourceforge.net上获取。