用于护理记录系统的自动标注对话生成

Automatic Labeled Dialogue Generation for Nursing Record Systems.

作者信息

Mairittha Tittaya, Mairittha Nattaya, Inoue Sozo

机构信息

Graduate School of Engineering, Kyushu Institute of Technology, 1-1 Sensui-cho, Tobata-ku, Kitakyushu-shi, Fukuoka 804-8550, Japan.

出版信息

J Pers Med. 2020 Jul 16;10(3):62. doi: 10.3390/jpm10030062.

DOI:10.3390/jpm10030062

PMID:32708593

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7564988/

Abstract

The integration of digital voice assistants in nursing residences is becoming increasingly important to facilitate nursing productivity with documentation. A key idea behind this system is training natural language understanding (NLU) modules that enable the machine to classify the purpose of the user utterance (intent) and extract pieces of valuable information present in the utterance (entity). One of the main obstacles when creating robust NLU is the lack of sufficient labeled data, which generally relies on human labeling. This process is cost-intensive and time-consuming, particularly in the high-level nursing care domain, which requires abstract knowledge. In this paper, we propose an automatic dialogue labeling framework of NLU tasks, specifically for nursing record systems. First, we apply data augmentation techniques to create a collection of variant sample utterances. The individual evaluation result strongly shows a stratification rate, with regard to both fluency and accuracy in utterances. We also investigate the possibility of applying deep generative models for our augmented dataset. The preliminary character-based model based on long short-term memory (LSTM) obtains an accuracy of 90% and generates various reasonable texts with BLEU scores of 0.76. Secondly, we introduce an idea for intent and entity labeling by using feature embeddings and semantic similarity-based clustering. We also empirically evaluate different embedding methods for learning good representations that are most suitable to use with our data and clustering tasks. Experimental results show that fastText embeddings produce strong performances both for intent labeling and on entity labeling, which achieves an accuracy level of 0.79 and 0.78 f1-scores and 0.67 and 0.61 silhouette scores, respectively.

摘要

在护理机构中集成数字语音助手对于提高护理记录工作效率变得越来越重要。该系统背后的一个关键理念是训练自然语言理解（NLU）模块，使机器能够对用户话语的目的（意图）进行分类，并提取话语中存在的有价值信息片段（实体）。创建强大的NLU时的主要障碍之一是缺乏足够的标注数据，而这通常依赖于人工标注。这个过程成本高昂且耗时，特别是在需要抽象知识的高级护理领域。在本文中，我们提出了一个NLU任务的自动对话标注框架，专门用于护理记录系统。首先，我们应用数据增强技术来创建一组变体样本话语。个体评估结果有力地显示了在话语的流畅性和准确性方面的分层率。我们还研究了将深度生成模型应用于我们的增强数据集的可能性。基于长短期记忆（LSTM）的初步基于字符的模型获得了90%的准确率，并生成了各种合理文本，BLEU分数为0.76。其次，我们引入了一种通过使用特征嵌入和基于语义相似性的聚类进行意图和实体标注的方法。我们还通过实证评估了不同的嵌入方法，以学习最适合我们的数据和聚类任务的良好表示。实验结果表明，fastText嵌入在意图标注和实体标注方面都表现出色，分别实现了0.79和0.78的F1分数以及0.67和0.61的轮廓分数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8094/7564988/e7008350bdd0/jpm-10-00062-g001.jpg

相似文献

Automatic Labeled Dialogue Generation for Nursing Record Systems.用于护理记录系统的自动标注对话生成

J Pers Med. 2020 Jul 16;10(3):62. doi: 10.3390/jpm10030062.

Natural language understanding of map navigation queries in Roman Urdu by joint entity and intent determination.通过联合实体和意图确定实现对乌尔都语罗马文地图导航查询的自然语言理解。

PeerJ Comput Sci. 2021 Jul 21;7:e615. doi: 10.7717/peerj-cs.615. eCollection 2021.

Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

Integrating a spoken dialogue system, nursing records, and activity data collection based on smartphones.整合基于智能手机的口语对话系统、护理记录和活动数据采集。

Comput Methods Programs Biomed. 2021 Oct;210:106364. doi: 10.1016/j.cmpb.2021.106364. Epub 2021 Aug 26.

A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.

A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations.电子病历中的中文临床命名实体识别：基于上下文特征表示的格长短期记忆模型的开发

JMIR Med Inform. 2020 Sep 4;8(9):e19848. doi: 10.2196/19848.

Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features.社交媒体中的药物警戒：使用带有词嵌入聚类特征的序列标注挖掘药物不良反应提及信息。

J Am Med Inform Assoc. 2015 May;22(3):671-81. doi: 10.1093/jamia/ocu041. Epub 2015 Mar 9.

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods.基于机器学习方法的中文电子健康记录临床命名实体识别

JMIR Med Inform. 2018 Dec 17;6(4):e50. doi: 10.2196/medinform.9965.

Long short-term memory RNN for biomedical named entity recognition.用于生物医学命名实体识别的长短期记忆循环神经网络

BMC Bioinformatics. 2017 Oct 30;18(1):462. doi: 10.1186/s12859-017-1868-5.

引用本文的文献

Construction and Application of an Intelligent Response System for COVID-19 Voice Consultation in China: A Retrospective Study.中国新冠病毒肺炎语音咨询智能应答系统的构建与应用：一项回顾性研究

Front Med (Lausanne). 2021 Nov 23;8:781781. doi: 10.3389/fmed.2021.781781. eCollection 2021.

本文引用的文献

Key challenges for delivering clinical impact with artificial intelligence.人工智能实现临床影响的关键挑战。

BMC Med. 2019 Oct 29;17(1):195. doi: 10.1186/s12916-019-1426-2.

Evaluating a Spoken Dialogue System for Recording Systems of Nursing Care.评估用于护理记录系统的口语对话系统。

Sensors (Basel). 2019 Aug 29;19(17):3736. doi: 10.3390/s19173736.

A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

Entity recognition from clinical texts via recurrent neural network.基于循环神经网络的临床文本实体识别。

BMC Med Inform Decis Mak. 2017 Jul 5;17(Suppl 2):67. doi: 10.1186/s12911-017-0468-7.

A distributed framework for health information exchange using smartphone technologies.一种使用智能手机技术进行健康信息交换的分布式框架。

J Biomed Inform. 2017 May;69:230-250. doi: 10.1016/j.jbi.2017.04.013. Epub 2017 Apr 20.

Structured prediction models for RNN based sequence labeling in clinical text.用于临床文本中基于循环神经网络的序列标注的结构化预测模型。

Proc Conf Empir Methods Nat Lang Process. 2016 Nov;2016:856-865. doi: 10.18653/v1/d16-1082.

Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks.结合条件随机场和双向递归神经网络的疾病命名实体识别

Database (Oxford). 2016 Oct 24;2016. doi: 10.1093/database/baw140. Print 2016.

Domain adaptation for semantic role labeling of clinical text.临床文本语义角色标注的领域适应

J Am Med Inform Assoc. 2015 Sep;22(5):967-79. doi: 10.1093/jamia/ocu048. Epub 2015 Jun 10.

Divisive Hierarchical Clustering towards Identifying Clinically Significant Pre-Diabetes Subpopulations.用于识别具有临床意义的糖尿病前期亚群的分裂层次聚类法

AMIA Annu Symp Proc. 2014 Nov 14;2014:1815-24. eCollection 2014.

Benchmarking clinical speech recognition and information extraction: new data, methods, and evaluations.基准临床语音识别和信息提取：新数据、方法和评估。

JMIR Med Inform. 2015 Apr 27;3(2):e19. doi: 10.2196/medinform.4321.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于护理记录系统的自动标注对话生成

Automatic Labeled Dialogue Generation for Nursing Record Systems.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献