使用互信息独立性模型和支持向量机加 sigmoid 函数识别生物医学文本中的名称。

Recognizing names in biomedical texts using mutual information independence model and SVM plus sigmoid.

作者信息

Zhou G D

机构信息

Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore.

出版信息

Int J Med Inform. 2006 Jun;75(6):456-67. doi: 10.1016/j.ijmedinf.2005.06.012. Epub 2005 Aug 19.

DOI:10.1016/j.ijmedinf.2005.06.012

PMID:16112894

Abstract

In this paper, we present a biomedical name recognition system, called PowerBioNE. In order to deal with the special phenomena in the biomedical domain, various evidential features are proposed and integrated through a mutual information independence model (MIIM). In addition, a support vector machine (SVM) plus sigmoid is proposed to resolve the data sparseness problem in the MIIM. In this way, the data sparseness problem in MIIM-based biomedical name recognition can be resolved effectively and a biomedical name recognition system with better performance and better portability can be achieved. Finally, we present two post-processing modules to deal with the nested entity name and abbreviation phenomena in the biomedical domain to further improve the performance. Evaluation shows that our system achieves F-measures of 69.1 and 71.2 on the 23 classes of GENIA V1.1 and V3.0, respectively. In particular, our system achieves an F-measure of 77.8 on the "protein" class of GENIA V3.0. It also shows that our system outperforms the best-reported system on GENIA V1.1 and V3.0.

摘要

在本文中，我们提出了一个名为PowerBioNE的生物医学命名识别系统。为了处理生物医学领域中的特殊现象，我们提出了各种证据特征，并通过互信息独立性模型（MIIM）进行整合。此外，还提出了一种支持向量机（SVM）加sigmoid的方法来解决MIIM中的数据稀疏问题。通过这种方式，可以有效解决基于MIIM的生物医学命名识别中的数据稀疏问题，并实现性能更好、可移植性更强的生物医学命名识别系统。最后，我们提出了两个后处理模块来处理生物医学领域中的嵌套实体名称和缩写现象，以进一步提高性能。评估表明，我们的系统在GENIA V1.1和V3.0的23个类别上分别达到了69.1和71.2的F值。特别是，我们的系统在GENIA V3.0的“蛋白质”类别上达到了77.8的F值。这也表明我们的系统在GENIA V1.1和V3.0上优于已报道的最佳系统。