Wang Yan
Anhui Wenda University of Information Engineering, Hefei, Anhui, China.
PLoS One. 2025 Aug 18;20(8):e0329381. doi: 10.1371/journal.pone.0329381. eCollection 2025.
Oral translation in English serves as a critical conduit for international communication and cultural exchange. However, the prevalent variations in pronunciation and the rapid pace of spoken language currently impede the efficacy of synchronous translation methods. To improve the quality and efficiency of synchronous oral translation, this paper explores the integration of cross-modal semantic understanding and synchronous enhancement specifically for English oral translation. This exploration commences with the implementation of a cross-modal translation scenario. Subsequently, the text sequence derived from this process is amalgamated with the original speech features via Bidirectional Encoder Representations from Transformers (BERT). The cross-information between modalities is explored, and linear transformation optimization is performed on the self-attention mechanism in Transformer to achieve context-awareness and understanding of oral-transcribed text. In conclusion, the integration of dynamic time warping (DTW) enhances real-time synchronization between speech and text, thereby improving translation fluency. Experimental results reveal that, when compared to the existing bilingual attention neural machine translation (NMT) model and the context-aware NMT model, the model proposed in this study yields an average bilingual evaluation understudy (BLEU) score that is 9.3% and 26.9% higher, respectively. Furthermore, its synchronization speed surpasses that of the other two models by 17.9% and 16.8%, respectively. These findings suggest that the fusion model, which incorporates context-awareness and an attention mechanism in cross-modal translation, can significantly elevate the quality and efficiency of English oral translation, offering a novel approach to the synchronous translation of spoken English.
英语口译是国际交流和文化交流的重要渠道。然而,目前普遍存在的发音差异和口语的快速节奏阻碍了同步翻译方法的效果。为了提高同步口译的质量和效率,本文专门针对英语口译探索跨模态语义理解与同步增强的整合。这种探索从实施跨模态翻译场景开始。随后,通过来自变换器的双向编码器表示(BERT)将此过程中得到的文本序列与原始语音特征合并。探索模态间的交叉信息,并对Transformer中的自注意力机制进行线性变换优化,以实现对口译转录文本的上下文感知和理解。总之,动态时间规整(DTW)的整合增强了语音与文本之间的实时同步,从而提高了翻译流畅性。实验结果表明,与现有的双语注意力神经机器翻译(NMT)模型和上下文感知NMT模型相比,本研究提出的模型的平均双语评估替补(BLEU)得分分别高出9.3%和26.9%。此外,其同步速度分别比其他两个模型快17.9%和16.8%。这些发现表明,在跨模态翻译中融入上下文感知和注意力机制的融合模型可以显著提高英语口译的质量和效率,为英语口语的同步翻译提供了一种新方法。