Suppr超能文献

对消费者通过互联网发布的与医疗保健相关的中文问题进行分类。

Classifying Chinese Questions Related to Health Care Posted by Consumers Via the Internet.

作者信息

Guo Haihong, Na Xu, Hou Li, Li Jiao

机构信息

Institute of Medical Information & Library, Chinese Academy of Medical Sciences, Beijing, China.

出版信息

J Med Internet Res. 2017 Jun 20;19(6):e220. doi: 10.2196/jmir.7156.

Abstract

BACKGROUND

In question answering (QA) system development, question classification is crucial for identifying information needs and improving the accuracy of returned answers. Although the questions are domain-specific, they are asked by non-professionals, making the question classification task more challenging.

OBJECTIVE

This study aimed to classify health care-related questions posted by the general public (Chinese speakers) on the Internet.

METHODS

A topic-based classification schema for health-related questions was built by manually annotating randomly selected questions. The Kappa statistic was used to measure the interrater reliability of multiple annotation results. Using the above corpus, we developed a machine-learning method to automatically classify these questions into one of the following six classes: Condition Management, Healthy Lifestyle, Diagnosis, Health Provider Choice, Treatment, and Epidemiology.

RESULTS

The consumer health question schema was developed with a four-hierarchical-level of specificity, comprising 48 quaternary categories and 35 annotation rules. The 2000 sample questions were coded with 2000 major codes and 607 minor codes. Using natural language processing techniques, we expressed the Chinese questions as a set of lexical, grammatical, and semantic features. Furthermore, the effective features were selected to improve the question classification performance. From the 6-category classification results, we achieved an average precision of 91.41%, recall of 89.62%, and F score of 90.24%.

CONCLUSIONS

In this study, we developed an automatic method to classify questions related to Chinese health care posted by the general public. It enables Artificial Intelligence (AI) agents to understand Internet users' information needs on health care.

摘要

背景

在问答系统开发中,问题分类对于识别信息需求和提高返回答案的准确性至关重要。尽管问题是特定领域的,但由非专业人员提出,这使得问题分类任务更具挑战性。

目的

本研究旨在对公众(说中文者)在互联网上发布的医疗保健相关问题进行分类。

方法

通过对随机选择的问题进行人工标注,构建了一个基于主题的健康相关问题分类模式。使用卡帕统计量来衡量多个标注结果的评分者间信度。利用上述语料库,我们开发了一种机器学习方法,将这些问题自动分类为以下六个类别之一:病情管理、健康生活方式、诊断、医疗服务提供者选择、治疗和流行病学。

结果

开发了消费者健康问题模式,具有四个层次的特异性,包括48个四级类别和35条标注规则。2000个样本问题被编码为2000个主要代码和607个次要代码。使用自然语言处理技术,我们将中文问题表达为一组词汇、语法和语义特征。此外,选择有效特征以提高问题分类性能。从六类分类结果来看,我们实现了平均精度为91.41%,召回率为89.62%,F值为90.24%。

结论

在本研究中,我们开发了一种自动方法来对公众发布的与中国医疗保健相关的问题进行分类。它使人工智能(AI)代理能够理解互联网用户对医疗保健的信息需求。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3530/5497072/2b8a6095b4aa/jmir_v19i6e220_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验