Saha Sujan Kumar, Prakash Amit, Majumder Mukta
1Department of Computer Science and Engineering, Birla Institute of Technology Mesra, Ranchi, 835215 India.
2Department of Computer Science and Application, University of North Bengal, West Bengal, India.
Health Inf Sci Syst. 2019 Feb 18;7(1):4. doi: 10.1007/s13755-019-0067-3. eCollection 2019 Dec.
Online remedy finders and health-related discussion forums have become increasingly popular in recent years. Common web users write their health problems there and request suggestion from experts or other users. As a result, these forums became a huge repository of information and discussions on various health issues. An intelligent information retrieval system can help to utilize this repository in various applications. In this paper, we propose a system for the automatic identification of existing similar forum posts given a new post. The system is based on computing similarity between two patient authored texts. For computing the similarity between the current post and existing posts, the system uses a hybrid strategy based on template information, topic modelling, and latent semantic indexing. The system is tested using a set of real questions collected from a homeopathy forum namely abchomeopathy.com. The relevance of the posts retrieved by the system is evaluated by human experts. The evaluation results demonstrate that the precision of the system is 88.87%.
近年来,在线治疗查找工具和与健康相关的讨论论坛越来越受欢迎。普通网络用户在那里写下他们的健康问题,并向专家或其他用户寻求建议。因此,这些论坛成为了一个关于各种健康问题的信息和讨论的巨大知识库。一个智能信息检索系统可以帮助在各种应用中利用这个知识库。在本文中,我们提出了一个系统,用于在给定一篇新帖子的情况下自动识别现有的相似论坛帖子。该系统基于计算两篇患者撰写文本之间的相似度。为了计算当前帖子与现有帖子之间的相似度,该系统使用了一种基于模板信息、主题建模和潜在语义索引的混合策略。该系统使用从顺势疗法论坛abchomeopathy.com收集的一组实际问题进行测试。系统检索到的帖子的相关性由人类专家进行评估。评估结果表明,该系统的精确率为88.87%。