Muhammad Yusuf Idris, Salim Naomie, Zainal Anazida
Faculty of Computing, Universiti Teknologi Malaysia, Skudai, Johor, Malaysia.
PeerJ Comput Sci. 2024 Oct 8;10:e2346. doi: 10.7717/peerj-cs.2346. eCollection 2024.
Understanding spoken language is crucial for conversational agents, with intent detection and slot filling being the primary tasks in natural language understanding (NLU). Enhancing the NLU tasks can lead to an accurate and efficient virtual assistant thereby reducing the need for human intervention and expanding their applicability in other domains. Traditionally, these tasks have been addressed individually, but recent studies have highlighted their interconnection, suggesting better results when solved together. Recent advances in natural language processing have shown that pretrained word embeddings can enhance text representation and improve the generalization capabilities of models. However, the challenge of poor generalization in joint learning models for intent detection and slot filling remains due to limited annotated datasets. Additionally, traditional models face difficulties in capturing both the semantic and syntactic nuances of language, which are vital for accurate intent detection and slot filling. This study proposes a hybridized text representation method using a multichannel convolutional neural network with three embedding channels: non-contextual embeddings for semantic information, part-of-speech (POS) tag embeddings for syntactic features, and contextual embeddings for deeper contextual understanding. Specifically, we utilized word2vec for non-contextual embeddings, one-hot vectors for POS tags, and bidirectional encoder representations from transformers (BERT) for contextual embeddings. These embeddings are processed through a convolutional layer and a shared bidirectional long short-term memory (BiLSTM) network, followed by two softmax functions for intent detection and slot filling. Experiments on the air travel information system (ATIS) and SNIPS datasets demonstrated that our model significantly outperformed the baseline models, achieving an intent accuracy of 97.90% and slot filling F1-score of 98.86% on the ATIS dataset, and an intent accuracy of 98.88% and slot filling F1-score of 97.07% on the SNIPS dataset. These results highlight the effectiveness of our proposed approach in advancing dialogue systems, and paving the way for more accurate and efficient natural language understanding in real-world applications.
对于对话智能体而言,理解口语至关重要,意图检测和槽位填充是自然语言理解(NLU)中的主要任务。增强NLU任务能够带来准确且高效的虚拟助手,从而减少对人工干预的需求,并扩大其在其他领域的适用性。传统上,这些任务是分别处理的,但最近的研究强调了它们之间的相互联系,表明一起解决时会取得更好的结果。自然语言处理的最新进展表明,预训练词嵌入可以增强文本表示并提高模型的泛化能力。然而,由于标注数据集有限,意图检测和槽位填充联合学习模型中泛化能力差的挑战依然存在。此外,传统模型在捕捉语言的语义和句法细微差别方面面临困难,而这些细微差别对于准确的意图检测和槽位填充至关重要。本研究提出了一种混合文本表示方法,该方法使用具有三个嵌入通道的多通道卷积神经网络:用于语义信息的非上下文嵌入、用于句法特征的词性(POS)标签嵌入以及用于更深入上下文理解的上下文嵌入。具体而言,我们使用word2vec进行非上下文嵌入,使用独热向量表示POS标签,并使用来自变换器的双向编码器表示(BERT)进行上下文嵌入。这些嵌入通过一个卷积层和一个共享的双向长短期记忆(BiLSTM)网络进行处理,随后通过两个softmax函数进行意图检测和槽位填充。在航空旅行信息系统(ATIS)和SNIPS数据集上的实验表明,我们的模型显著优于基线模型,在ATIS数据集上实现了97.90%的意图准确率和98.86%的槽位填充F1分数,在SNIPS数据集上实现了98.88%的意图准确率和97.07%的槽位填充F1分数。这些结果凸显了我们提出方法在推进对话系统方面的有效性,并为在实际应用中实现更准确高效的自然语言理解铺平了道路。