IEEE J Biomed Health Inform. 2021 Feb;25(2):591-601. doi: 10.1109/JBHI.2020.3032479. Epub 2021 Feb 5.
Today Information in the world wide web is overwhelmed by unprecedented quantity of data on versatile topics with varied quality. However, the quality of information disseminated in the field of medicine has been questioned as the negative health consequences of health misinformation can be life-threatening. There is currently no generic automated tool for evaluating the quality of online health information spanned over broad range. To address this gap, in this paper, we applied data mining approach to automatically assess the quality of online health articles based on 10 quality criteria. We have prepared a labelled dataset with 53012 features and applied different feature selection methods to identify the best feature subset with which our trained classifier achieved an accuracy of [Formula: see text] varied over 10 criteria. Our semantic analysis of features shows the underpinning associations between the selected features & assessment criteria and further rationalize our assessment approach. Our findings will help in identifying high quality health articles and thus aiding users in shaping their opinion to make right choice while picking health related help from online.
如今,互联网上的信息泛滥,涵盖了各种主题的前所未有的大量数据,质量也参差不齐。然而,由于医疗领域传播的信息质量存在问题,健康错误信息可能会危及生命,因此受到了质疑。目前,还没有通用的自动化工具来评估广泛范围内的在线健康信息的质量。为了解决这一差距,在本文中,我们应用数据挖掘方法,根据 10 个质量标准,自动评估在线健康文章的质量。我们准备了一个带有 53012 个特征的标记数据集,并应用了不同的特征选择方法来识别最佳特征子集,我们训练的分类器在 10 个标准上的准确率达到了[公式:见正文]。我们对特征的语义分析显示了所选特征与评估标准之间的潜在关联,并进一步合理化了我们的评估方法。我们的研究结果将有助于识别高质量的健康文章,从而帮助用户在从在线获取健康相关帮助时形成自己的观点,做出正确的选择。