Suppr超能文献

用于护理记录系统的自动标注对话生成

Automatic Labeled Dialogue Generation for Nursing Record Systems.

作者信息

Mairittha Tittaya, Mairittha Nattaya, Inoue Sozo

机构信息

Graduate School of Engineering, Kyushu Institute of Technology, 1-1 Sensui-cho, Tobata-ku, Kitakyushu-shi, Fukuoka 804-8550, Japan.

出版信息

J Pers Med. 2020 Jul 16;10(3):62. doi: 10.3390/jpm10030062.

Abstract

The integration of digital voice assistants in nursing residences is becoming increasingly important to facilitate nursing productivity with documentation. A key idea behind this system is training natural language understanding (NLU) modules that enable the machine to classify the purpose of the user utterance (intent) and extract pieces of valuable information present in the utterance (entity). One of the main obstacles when creating robust NLU is the lack of sufficient labeled data, which generally relies on human labeling. This process is cost-intensive and time-consuming, particularly in the high-level nursing care domain, which requires abstract knowledge. In this paper, we propose an automatic dialogue labeling framework of NLU tasks, specifically for nursing record systems. First, we apply data augmentation techniques to create a collection of variant sample utterances. The individual evaluation result strongly shows a stratification rate, with regard to both fluency and accuracy in utterances. We also investigate the possibility of applying deep generative models for our augmented dataset. The preliminary character-based model based on long short-term memory (LSTM) obtains an accuracy of 90% and generates various reasonable texts with BLEU scores of 0.76. Secondly, we introduce an idea for intent and entity labeling by using feature embeddings and semantic similarity-based clustering. We also empirically evaluate different embedding methods for learning good representations that are most suitable to use with our data and clustering tasks. Experimental results show that fastText embeddings produce strong performances both for intent labeling and on entity labeling, which achieves an accuracy level of 0.79 and 0.78 f1-scores and 0.67 and 0.61 silhouette scores, respectively.

摘要

在护理机构中集成数字语音助手对于提高护理记录工作效率变得越来越重要。该系统背后的一个关键理念是训练自然语言理解(NLU)模块,使机器能够对用户话语的目的(意图)进行分类,并提取话语中存在的有价值信息片段(实体)。创建强大的NLU时的主要障碍之一是缺乏足够的标注数据,而这通常依赖于人工标注。这个过程成本高昂且耗时,特别是在需要抽象知识的高级护理领域。在本文中,我们提出了一个NLU任务的自动对话标注框架,专门用于护理记录系统。首先,我们应用数据增强技术来创建一组变体样本话语。个体评估结果有力地显示了在话语的流畅性和准确性方面的分层率。我们还研究了将深度生成模型应用于我们的增强数据集的可能性。基于长短期记忆(LSTM)的初步基于字符的模型获得了90%的准确率,并生成了各种合理文本,BLEU分数为0.76。其次,我们引入了一种通过使用特征嵌入和基于语义相似性的聚类进行意图和实体标注的方法。我们还通过实证评估了不同的嵌入方法,以学习最适合我们的数据和聚类任务的良好表示。实验结果表明,fastText嵌入在意图标注和实体标注方面都表现出色,分别实现了0.79和0.78的F1分数以及0.67和0.61的轮廓分数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8094/7564988/e7008350bdd0/jpm-10-00062-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验