• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于中文医学问题意图分类的基准数据集和案例研究。

A benchmark dataset and case study for Chinese medical question intent classification.

机构信息

Inner Mongolia Key Laboratory of Mongolian Information Processing Technology, College of Computer Science, Inner Mongolia Univeristy, University West Road, Hohhot, China.

出版信息

BMC Med Inform Decis Mak. 2020 Jul 9;20(Suppl 3):125. doi: 10.1186/s12911-020-1122-3.

DOI:10.1186/s12911-020-1122-3
PMID:32646426
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7346345/
Abstract

BACKGROUND

To provide satisfying answers, medical QA system has to understand the intentions of the users' questions precisely. For medical intent classification, it requires high-quality datasets to train a deep-learning approach in a supervised way. Currently, there is no public dataset for Chinese medical intent classification, and the datasets of other fields are not applicable to the medical QA system. To solve this problem, we construct a Chinese medical intent dataset (CMID) using the questions from medical QA websites. On this basis, we compare four intent classification models on CMID using a case study.

METHODS

The questions in CMID are obtained from several medical QA websites. The intent annotation standard is developed by the medical experts, which includes four types and 36 subtypes of users' intents. Besides the intent label, CMID also provides two types of additional information, including word segmentation and named entity. We use the crowdsourcing way to annotate the intent information for each Chinese medical question. Word segmentation and named entities are obtained using the Jieba and a well-trained Lattice-LSTM model. We loaded a Chinese medical dictionary consisting of 530,000 for word segmentation to obtain a more accurate result. We also select four popular deep learning-based models and compare their performances of intent classification on CMID.

RESULTS

The final CMID contains 12,000 Chinese medical questions and is organized in JSON format. Each question is labeled the intention, word segmentation, and named entity information. The information about question length, number of entities, and are also detailed analyzed. Among Fast Text, TextCNN, TextRNN, and TextGCN, Fast Text and TextCNN models have achieved the best results in four types and 36 subtypes intent classification, respectively.

CONCLUSIONS

In this work, we provide a dataset for Chinese medical intent classification, which can be used in medical QA and related fields. We performed an intent classification task on the CMID. In addition, we also did some analysis on the content of the dataset.

摘要

背景

为了提供令人满意的答案,医学 QA 系统必须准确理解用户问题的意图。对于医学意图分类,它需要高质量的数据集来以监督的方式训练深度学习方法。目前,没有用于中文医学意图分类的公共数据集,而其他领域的数据集不适用于医学 QA 系统。为了解决这个问题,我们使用来自医学 QA 网站的问题构建了一个中文医学意图数据集(CMID)。在此基础上,我们通过案例研究比较了四种意图分类模型在 CMID 上的表现。

方法

CMID 中的问题是从几个医学 QA 网站中获取的。意图标注标准是由医学专家制定的,包括用户意图的四个类型和 36 个子类型。除了意图标签外,CMID 还提供了两种类型的附加信息,包括分词和命名实体。我们使用众包方式为每个中文医学问题标注意图信息。分词和命名实体是使用结巴和训练有素的 Lattice-LSTM 模型获得的。我们加载了一个包含 53 万个单词的中文医学词典,以获得更准确的结果。我们还选择了四个流行的基于深度学习的模型,并比较了它们在 CMID 上的意图分类性能。

结果

最终的 CMID 包含 12000 个中文医学问题,以 JSON 格式组织。每个问题都标注了意图、分词和命名实体信息。还详细分析了问题长度、实体数量等信息。在 FastText、TextCNN、TextRNN 和 TextGCN 中,FastText 和 TextCNN 模型在四类和三十六种子类意图分类中分别取得了最好的结果。

结论

在这项工作中,我们提供了一个用于中文医学意图分类的数据集,可用于医学 QA 和相关领域。我们在 CMID 上执行了意图分类任务。此外,我们还对数据集的内容进行了一些分析。

相似文献

1
A benchmark dataset and case study for Chinese medical question intent classification.用于中文医学问题意图分类的基准数据集和案例研究。
BMC Med Inform Decis Mak. 2020 Jul 9;20(Suppl 3):125. doi: 10.1186/s12911-020-1122-3.
2
An attention-based multi-task model for named entity recognition and intent analysis of Chinese online medical questions.基于注意力的多任务模型,用于中文在线医疗问题的命名实体识别和意图分析。
J Biomed Inform. 2020 Aug;108:103511. doi: 10.1016/j.jbi.2020.103511. Epub 2020 Jul 14.
3
Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods.基于机器学习方法的中文电子健康记录临床命名实体识别
JMIR Med Inform. 2018 Dec 17;6(4):e50. doi: 10.2196/medinform.9965.
4
Applying deep matching networks to Chinese medical question answering: a study and a dataset.将深度匹配网络应用于中文医学问答:一项研究与数据集。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):52. doi: 10.1186/s12911-019-0761-8.
5
Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations.电子病历中的中文临床命名实体识别:基于上下文特征表示的格长短期记忆模型的开发
JMIR Med Inform. 2020 Sep 4;8(9):e19848. doi: 10.2196/19848.
6
A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。
BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.
7
An intent classification method for questions in "Treatise on Febrile diseases" based on TinyBERT-CNN fusion model.基于 TinyBERT-CNN 融合模型的《伤寒论》问句意图分类方法。
Comput Biol Med. 2023 Aug;162:107075. doi: 10.1016/j.compbiomed.2023.107075. Epub 2023 May 29.
8
Multi-task learning for Chinese clinical named entity recognition with external knowledge.基于外部知识的多任务学习在中文临床命名实体识别中的应用。
BMC Med Inform Decis Mak. 2021 Dec 31;21(1):372. doi: 10.1186/s12911-021-01717-1.
9
Answering medical questions in Chinese using automatically mined knowledge and deep neural networks: an end-to-end solution.利用自动挖掘的知识和深度神经网络用中文回答医学问题:一种端到端的解决方案。
BMC Bioinformatics. 2022 Apr 15;23(1):136. doi: 10.1186/s12859-022-04658-2.
10
SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.SemBioNLQA:一个语义生物医学问答系统,用于检索自然语言问题的准确和理想答案。
Artif Intell Med. 2020 Jan;102:101767. doi: 10.1016/j.artmed.2019.101767. Epub 2019 Nov 28.

引用本文的文献

1
Research on performance variations of classifiers with the influence of pre-processing methods for Chinese short text classification.中文短文本分类中预处理方法对分类器性能变化的影响研究。
PLoS One. 2023 Oct 12;18(10):e0292582. doi: 10.1371/journal.pone.0292582. eCollection 2023.
2
A Comparative Study of Natural Language Processing Algorithms Based on Cities Changing Diabetes Vulnerability Data.基于城市糖尿病易感性数据的自然语言处理算法对比研究
Healthcare (Basel). 2022 Jun 15;10(6):1119. doi: 10.3390/healthcare10061119.