Departamento de Informática em Saúde, Universidade Federal de São Paulo, Rua Botucatu 862, Vila Clementino, São Paulo, SP, Brazil.
J Biomed Inform. 2011 Apr;44(2):299-309. doi: 10.1016/j.jbi.2010.12.002. Epub 2010 Dec 16.
Internet users are increasingly using the worldwide web to search for information relating to their health. This situation makes it necessary to create specialized tools capable of supporting users in their searches.
To apply and compare strategies that were developed to investigate the use of the Portuguese version of Medical Subject Headings (MeSH) for constructing an automated classifier for Brazilian Portuguese-language web-based content within or outside of the field of healthcare, focusing on the lay public.
3658 Brazilian web pages were used to train the classifier and 606 Brazilian web pages were used to validate it. The strategies proposed were constructed using content-based vector methods for text classification, such that Naive Bayes was used for the task of classifying vector patterns with characteristics obtained through the proposed strategies.
A strategy named InDeCS was developed specifically to adapt MeSH for the problem that was put forward. This approach achieved better accuracy for this pattern classification task (0.94 sensitivity, specificity and area under the ROC curve).
Because of the significant results achieved by InDeCS, this tool has been successfully applied to the Brazilian healthcare search portal known as Busca Saúde. Furthermore, it could be shown that MeSH presents important results when used for the task of classifying web-based content focusing on the lay public. It was also possible to show from this study that MeSH was able to map out mutable non-deterministic characteristics of the web.
互联网用户越来越多地使用万维网搜索与其健康相关的信息。这种情况使得有必要创建专门的工具,以支持用户进行搜索。
应用和比较开发的策略,以调查是否可以使用葡萄牙语版医学主题词(MeSH)来构建针对巴西葡萄牙语网络内容(无论是否属于医疗保健领域)的自动分类器,重点关注普通大众。
使用 3658 个巴西网页对分类器进行训练,使用 606 个巴西网页对其进行验证。所提出的策略是使用基于内容的文本分类向量方法构建的,以便通过所提出的策略获得的特征对向量模式进行分类。
专门开发了一种名为 InDeCS 的策略来适应 MeSH 提出的问题。这种方法在该模式分类任务中实现了更好的准确性(0.94 的敏感性、特异性和 ROC 曲线下面积)。
由于 InDeCS 取得了显著的结果,该工具已成功应用于名为 Busca Saúde 的巴西医疗保健搜索门户。此外,还可以证明 MeSH 在用于分类针对普通大众的网络内容的任务中具有重要的结果。本研究还表明,MeSH 能够映射出网络的可变非确定性特征。