National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu 610065, China.
National Key Laboratory of Air Traffic Control Automation System Technology, Sichuan University, Chengdu 610065, China.
Sensors (Basel). 2019 Feb 7;19(3):679. doi: 10.3390/s19030679.
In order to obtain real-time controlling dynamics in air traffic system, a framework is proposed to introduce and process air traffic control (ATC) speech via radiotelephony communication. An automatic speech recognition (ASR) and controlling instruction understanding (CIU)-based pipeline is designed to convert the ATC speech into ATC related elements, i.e., controlling intent and parameters. A correction procedure is also proposed to improve the reliability of the information obtained by the proposed framework. In the ASR model, acoustic model (AM), pronunciation model (PM), and phoneme- and word-based language model (LM) are proposed to unify multilingual ASR into one model. In this work, based on their tasks, the AM and PM are defined as speech recognition and machine translation problems respectively. Two-dimensional convolution and average-pooling layers are designed to solve special challenges of ASR in ATC. An encoder⁻decoder architecture-based neural network is proposed to translate phoneme labels into word labels, which achieves the purpose of ASR. In the CIU model, a recurrent neural network-based joint model is proposed to detect the controlling intent and label the controlling parameters, in which the two tasks are solved in one network to enhance the performance with each other based on ATC communication rules. The ATC speech is now converted into ATC related elements by the proposed ASR and CIU model. To further improve the accuracy of the sensing framework, a correction procedure is proposed to revise minor mistakes in ASR decoding results based on the flight information, such as flight plan, ADS-B. The proposed models are trained using real operating data and applied to a civil aviation airport in China to evaluate their performance. Experimental results show that the proposed framework can obtain real-time controlling dynamics with high performance, only 4% word-error rate. Meanwhile, the decoding efficiency can also meet the requirement of real-time applications, i.e., an average 0.147 real time factor. With the proposed framework and obtained traffic dynamics, current ATC applications can be accomplished with higher accuracy. In addition, the proposed ASR pipeline has high reusability, which allows us to apply it to other controlling scenes and languages with minor changes.
为了在航空交通系统中获得实时控制动态,本文提出了一种通过无线电通信引入和处理空中交通管制(ATC)语音的框架。设计了一个基于自动语音识别(ASR)和控制指令理解(CIU)的管道,将 ATC 语音转换为与 ATC 相关的元素,即控制意图和参数。还提出了一种校正程序来提高所提出框架获得的信息的可靠性。在 ASR 模型中,提出了声学模型(AM)、发音模型(PM)和基于音素和单词的语言模型(LM),将多语言 ASR 统一到一个模型中。在这项工作中,根据其任务,将 AM 和 PM 分别定义为语音识别和机器翻译问题。二维卷积和平均池化层被设计用于解决 ATC 中 ASR 的特殊挑战。提出了一种基于编码器-解码器架构的神经网络,将音素标签转换为单词标签,从而实现 ASR 的目的。在 CIU 模型中,提出了一种基于循环神经网络的联合模型来检测控制意图并标记控制参数,其中两个任务在一个网络中解决,以根据 ATC 通信规则相互增强性能。通过所提出的 ASR 和 CIU 模型,将 ATC 语音转换为与 ATC 相关的元素。为了进一步提高感知框架的准确性,提出了一种校正程序,根据飞行计划、ADS-B 等飞行信息,修正 ASR 解码结果中的小错误。使用真实运行数据对所提出的模型进行训练,并将其应用于中国的一个民用机场来评估其性能。实验结果表明,所提出的框架可以以高性能获得实时控制动态,仅 4%的单词错误率。同时,解码效率也可以满足实时应用的要求,即平均 0.147 实时因子。使用所提出的框架和获得的交通动态,可以以更高的精度完成当前的 ATC 应用。此外,所提出的 ASR 管道具有很高的可重用性,允许我们对其他控制场景和语言进行少量更改后应用它。