Dolamic Ljiljana, Boyer Célia
Health on the Net Foundation, Geneva, Switzerland.
Stud Health Technol Inform. 2013;192:1133.
This paper describes and evaluates the public health web pages classification model based on key phrase extraction and matching. Easily extendible both in terms of new classes as well as the new language this method proves to be a good solution for text classification faced with the total lack of training data. To evaluate the proposed solution we have used a small collection of public health related web pages created by a double blind manual classification. Our experiments have shown that by choosing the adequate threshold value the desired value for either precision or recall can be achieved.
本文描述并评估了基于关键词提取与匹配的公共卫生网页分类模型。该方法在新类别和新语言方面都易于扩展,事实证明,对于面临完全缺乏训练数据的文本分类而言,它是一个很好的解决方案。为了评估所提出的解决方案,我们使用了通过双盲人工分类创建的一小批与公共卫生相关的网页。我们的实验表明,通过选择适当的阈值,可以实现所需的精确率或召回率值。