Yang Zhihao, Lin Hongfei, Li Yanpeng
Department of Computer Science and Engineering, Dalian University of Technology, 116023 Dalian, China.
Comput Biol Chem. 2008 Aug;32(4):287-91. doi: 10.1016/j.compbiolchem.2008.03.008. Epub 2008 Apr 1.
Bio-entity name recognition is the key step for information extraction from biomedical literature. This paper presents a dictionary-based bio-entity name recognition approach. The approach expands the bio-entity name dictionary via the Abbreviation Definitions identifying algorithm, improves the recall rate through the improved edit distance algorithm and adopts some post-processing methods including Pre-keyword and Post-keyword expansion, Part of Speech expansion, merge of adjacent bio-entity names and the exploitation of the contextual cues to further improve the performance. Experiment results show that with this approach even an internal dictionary-based system could achieve a fairly good performance.
生物实体名称识别是从生物医学文献中提取信息的关键步骤。本文提出了一种基于词典的生物实体名称识别方法。该方法通过缩写定义识别算法扩展生物实体名称词典,通过改进的编辑距离算法提高召回率,并采用一些后处理方法,包括关键词前扩展和关键词后扩展、词性扩展、相邻生物实体名称合并以及利用上下文线索来进一步提高性能。实验结果表明,使用这种方法,即使是基于内部词典的系统也能取得相当不错的性能。