中国在线医疗卫生社区用户需求的多层次分类：基于图卷积网络的模型开发与评估

Multilevel Classification of Users' Needs in Chinese Online Medical and Health Communities: Model Development and Evaluation Based on Graph Convolutional Network.

作者信息

Cheng Quan, Lin Yingru

机构信息

School of Economics and Management, Fuzhou University, Fuzhou, China.

出版信息

JMIR Form Res. 2023 Apr 20;7:e42297. doi: 10.2196/42297.

DOI:10.2196/42297

PMID:37079346

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10160934/

Abstract

BACKGROUND

Online medical and health communities provide a platform for internet users to share experiences and ask questions about medical and health issues. However, there are problems in these communities, such as the low accuracy of the classification of users' questions and the uneven health literacy of users, which affect the accuracy of user retrieval and the professionalism of the medical personnel answering the question. In this context, it is essential to study more effective classification methods of users' information needs.

OBJECTIVE

Most online medical and health communities tend to provide only disease-type labels, which do not give a comprehensive summary of users' needs. The study aims to construct a multilevel classification framework based on the graph convolutional network (GCN) model for users' needs in online medical and health communities so that users can perform more targeted information retrieval.

METHODS

Using the Chinese online medical and health community "Qiuyi" as an example, we crawled questions posted by users in the "Cardiovascular Disease" section as the data source. First, the disease types involved in the problem data were segmented by manual coding to generate the first-level label. Second, the needs were identified by K-means clustering to generate the users' information needs label as the second-level label. Finally, by constructing a GCN model, users' questions were automatically classified, thus realizing the multilevel classification of users' needs.

RESULTS

Based on the empirical research of questions posted by users in the "Cardiovascular Disease" section of Qiuyi, the hierarchical classification of users' questions (data) was realized. The classification models designed in the study achieved accuracy, precision, recall, and F1-score of 0.6265, 0.6328, 0.5788, and 0.5912, respectively. Compared with the traditional machine learning method naïve Bayes and the deep learning method hierarchical text classification convolutional neural network, our classification model showed better performance. At the same time, we also performed a single-level classification experiment on users' needs, which in comparison with the multilevel classification model exhibited a great improvement.

CONCLUSIONS

A multilevel classification framework has been designed based on the GCN model. The results demonstrated that the method is effective in classifying users' information needs in online medical and health communities. At the same time, users with different diseases have different directions for information needs, which plays an important role in providing diversified and targeted services to the online medical and health community. Our method is also applicable to other similar disease classifications.

摘要

背景

在线医疗健康社区为互联网用户提供了一个分享经验和询问医疗健康问题的平台。然而，这些社区存在一些问题，如用户问题分类的准确性较低以及用户健康素养参差不齐，这影响了用户检索的准确性和回答问题的医务人员的专业性。在这种背景下，研究更有效的用户信息需求分类方法至关重要。

目的

大多数在线医疗健康社区往往只提供疾病类型标签，无法全面概括用户需求。本研究旨在基于图卷积网络（GCN）模型构建一个针对在线医疗健康社区用户需求的多层次分类框架，以便用户能够进行更有针对性的信息检索。

方法

以中国在线医疗健康社区“求医”为例，我们抓取了用户在“心血管疾病”板块发布的问题作为数据源。首先，通过人工编码对问题数据中涉及的疾病类型进行分词，生成一级标签。其次，通过K均值聚类确定需求，生成用户信息需求标签作为二级标签。最后，通过构建GCN模型对用户问题进行自动分类，从而实现用户需求的多层次分类。

结果

基于对求医“心血管疾病”板块用户发布问题的实证研究，实现了用户问题（数据）的层次分类。本研究设计的分类模型的准确率、精确率、召回率和F1值分别达到0.6265、0.6328、0.5788和0.5912。与传统机器学习方法朴素贝叶斯和深度学习方法层次文本分类卷积神经网络相比，我们的分类模型表现出更好的性能。同时，我们还对用户需求进行了单层次分类实验，与多层次分类模型相比有了很大改进。