Zeng Yuqun, Liu Xusheng, Wang Yanshan, Shen Feichen, Liu Sijia, Rastegar-Mojarad Majid, Wang Liwei, Liu Hongfang
The Second Clinical College, Guangzhou University of Chinese Medicine, Guangzhou, China.
Department of Health Sciences Research, Mayo College of Medicine, Mayo Clinic, Rochester, MN, United States.
J Med Internet Res. 2017 Oct 16;19(10):e342. doi: 10.2196/jmir.7754.
Self-management is crucial to diabetes care and providing expert-vetted content for answering patients' questions is crucial in facilitating patient self-management.
The aim is to investigate the use of information retrieval techniques in recommending patient education materials for diabetic questions of patients.
We compared two retrieval algorithms, one based on Latent Dirichlet Allocation topic modeling (topic modeling-based model) and one based on semantic group (semantic group-based model), with the baseline retrieval models, vector space model (VSM), in recommending diabetic patient education materials to diabetic questions posted on the TuDiabetes forum. The evaluation was based on a gold standard dataset consisting of 50 randomly selected diabetic questions where the relevancy of diabetic education materials to the questions was manually assigned by two experts. The performance was assessed using precision of top-ranked documents.
We retrieved 7510 diabetic questions on the forum and 144 diabetic patient educational materials from the patient education database at Mayo Clinic. The mapping rate of words in each corpus mapped to the Unified Medical Language System (UMLS) was significantly different (P<.001). The topic modeling-based model outperformed the other retrieval algorithms. For example, for the top-retrieved document, the precision of the topic modeling-based, semantic group-based, and VSM models was 67.0%, 62.8%, and 54.3%, respectively.
This study demonstrated that topic modeling can mitigate the vocabulary difference and it achieved the best performance in recommending education materials for answering patients' questions. One direction for future work is to assess the generalizability of our findings and to extend our study to other disease areas, other patient education material resources, and online forums.
自我管理对糖尿病护理至关重要,提供经过专家审核的内容以回答患者问题对于促进患者自我管理至关重要。
旨在研究信息检索技术在为患者的糖尿病相关问题推荐患者教育材料中的应用。
我们将两种检索算法,一种基于潜在狄利克雷分配主题建模(基于主题建模的模型),另一种基于语义组(基于语义组的模型),与基线检索模型向量空间模型(VSM)进行比较,以向TuDiabetes论坛上发布的糖尿病问题推荐糖尿病患者教育材料。评估基于一个由50个随机选择的糖尿病问题组成的金标准数据集,其中两名专家手动确定糖尿病教育材料与问题的相关性。使用排名靠前的文档的精确率来评估性能。
我们在论坛上检索到7510个糖尿病问题,并从梅奥诊所的患者教育数据库中获取了144份糖尿病患者教育材料。每个语料库中映射到统一医学语言系统(UMLS)的单词映射率存在显著差异(P<0.001)。基于主题建模的模型优于其他检索算法。例如,对于检索到的排名第一的文档,基于主题建模的模型、基于语义组的模型和VSM模型的精确率分别为67.0%、62.8%和54.3%。
本研究表明主题建模可以减轻词汇差异,并且在为回答患者问题推荐教育材料方面表现最佳。未来工作的一个方向是评估我们研究结果的可推广性,并将我们的研究扩展到其他疾病领域、其他患者教育材料资源和在线论坛。