Habib Mona Soliman, Kalita Jugal
Cairo Microsoft Innovation Lab, 306 Korniche El-Nile, Maadi Cairo, Egypt.
Int J Bioinform Res Appl. 2010;6(2):191-208. doi: 10.1504/IJBRA.2010.032121.
This paper explores scalability issues associated with the Named Entity Recognition problem in the biomedical publications domain using Support Vector Machines. The performance results using existing binary and multi-class SVMs with increasing training data are compared to results obtained using our new implementations. Our approach eliminates prior language or domain-specific knowledge and achieves good out-of-the-box accuracy measures comparable to those obtained using more complex approaches. The training time of multi-class SVMs is reduced by several orders of magnitude, which would make support vector machines a more viable and practical solution for real-world problems with large datasets.
本文使用支持向量机探讨了生物医学出版物领域中与命名实体识别问题相关的可扩展性问题。将现有二分类和多分类支持向量机在训练数据增加时的性能结果与使用我们新实现方法获得的结果进行了比较。我们的方法无需先前的语言或特定领域知识,并且实现了与使用更复杂方法相当的良好开箱即用准确率指标。多分类支持向量机的训练时间减少了几个数量级,这将使支持向量机成为处理具有大型数据集的现实世界问题更可行、更实用的解决方案。