Zhou GuoDong, Shen Dan, Zhang Jie, Su Jian, Tan SoonHeng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore.
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S7. doi: 10.1186/1471-2105-6-S1-S7. Epub 2005 May 24.
This paper proposes an ensemble of classifiers for biomedical name recognition in which three classifiers, one Support Vector Machine and two discriminative Hidden Markov Models, are combined effectively using a simple majority voting strategy. In addition, we incorporate three post-processing modules, including an abbreviation resolution module, a protein/gene name refinement module and a simple dictionary matching module, into the system to further improve the performance. Evaluation shows that our system achieves the best performance from among 10 systems with a balanced F-measure of 82.58 on the closed evaluation of the BioCreative protein/gene name recognition task (Task 1A).
本文提出了一种用于生物医学命名实体识别的分类器集成方法,其中三个分类器,一个支持向量机和两个判别式隐马尔可夫模型,采用简单多数投票策略有效组合。此外,我们将三个后处理模块,包括一个缩写解析模块、一个蛋白质/基因名称细化模块和一个简单的字典匹配模块,纳入系统以进一步提高性能。评估表明,在BioCreative蛋白质/基因名称识别任务(任务1A)的封闭评估中,我们的系统在10个系统中取得了最佳性能,平衡F值为82.58。