• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

支持多达100种语言的联合语音与文本机器翻译。

Joint speech and text machine translation for up to 100 languages.

出版信息

Nature. 2025 Jan;637(8046):587-593. doi: 10.1038/s41586-024-08359-z. Epub 2025 Jan 15.

DOI:10.1038/s41586-024-08359-z
PMID:39815098
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11735396/
Abstract

Creating the Babel Fish, a tool that helps individuals translate speech between any two languages, requires advanced technological innovation and linguistic expertise. Although conventional speech-to-speech translation systems composed of multiple subsystems performing translation in a cascaded fashion exist, scalable and high-performing unified systems remain underexplored. To address this gap, here we introduce SEAMLESSM4T-Massively Multilingual and Multimodal Machine Translation-a single model that supports speech-to-speech translation (101 to 36 languages), speech-to-text translation (from 101 to 96 languages), text-to-speech translation (from 96 to 36 languages), text-to-text translation (96 languages) and automatic speech recognition (96 languages). Built using a new multimodal corpus of automatically aligned speech translations and other publicly available data, SEAMLESSM4T is one of the first multilingual systems that can translate from and into English for both speech and text. Moreover, it outperforms the existing state-of-the-art cascaded systems, achieving up to 8% and 23% higher BLEU (Bilingual Evaluation Understudy) scores in speech-to-text and speech-to-speech tasks, respectively. Beyond quality, when tested for robustness, our system is, on average, approximately 50% more resilient against background noise and speaker variations in speech-to-text tasks than the previous state-of-the-art systems. We evaluated SEAMLESSM4T on added toxicity and gender bias to assess translation safety. For the former, we included two strategies for added toxicity mitigation working at either training or inference time. Finally, all contributions in this work are publicly available for non-commercial use to propel further research on inclusive speech translation technologies.

摘要

创造巴别鱼(一种帮助人们在任意两种语言之间进行语音翻译的工具)需要先进的技术创新和语言专业知识。尽管存在由多个子系统以级联方式执行翻译的传统语音到语音翻译系统,但可扩展且高性能的统一系统仍未得到充分探索。为了弥补这一差距,我们在此引入SEAMLESSM4T——大规模多语言多模态机器翻译,这是一个支持语音到语音翻译(101种到36种语言)、语音到文本翻译(101种到96种语言)、文本到语音翻译(96种到36种语言)、文本到文本翻译(96种语言)以及自动语音识别(96种语言)的单一模型。SEAMLESSM4T使用自动对齐的语音翻译新多模态语料库和其他公开可用数据构建而成,是首批能够在语音和文本方面进行英语翻译的多语言系统之一。此外,它的性能优于现有的最先进级联系统,在语音到文本和语音到语音任务中分别实现了高达8%和23%的更高双语评估替补(BLEU)分数。除了质量之外,在进行鲁棒性测试时,我们的系统在语音到文本任务中平均比之前的最先进系统对背景噪声和说话者变化的弹性高出约50%。我们对SEAMLESSM4T进行了额外毒性和性别偏见方面的评估,以评估翻译安全性。对于前者,我们纳入了在训练或推理时减轻额外毒性的两种策略。最后,这项工作中的所有贡献都可供非商业使用,以推动对包容性语音翻译技术的进一步研究。

相似文献

1
Joint speech and text machine translation for up to 100 languages.支持多达100种语言的联合语音与文本机器翻译。
Nature. 2025 Jan;637(8046):587-593. doi: 10.1038/s41586-024-08359-z. Epub 2025 Jan 15.
2
Scaling neural machine translation to 200 languages.将神经机器翻译扩展到 200 种语言。
Nature. 2024 Jun;630(8018):841-846. doi: 10.1038/s41586-024-07335-x. Epub 2024 Jun 5.
3
Neural machine translation of clinical texts between long distance languages.长距离语言之间的临床文本的神经机器翻译。
J Am Med Inform Assoc. 2019 Dec 1;26(12):1478-1487. doi: 10.1093/jamia/ocz110.
4
Large Language Model Ability to Translate CT and MRI Free-Text Radiology Reports Into Multiple Languages.大型语言模型将CT和MRI自由文本放射学报告翻译成多种语言的能力。
Radiology. 2024 Dec;313(3):e241736. doi: 10.1148/radiol.241736.
5
Towards better text image machine translation with multimodal codebook and multi-stage training.利用多模态码本和多阶段训练实现更好的文本图像机器翻译。
Neural Netw. 2025 Sep;189:107599. doi: 10.1016/j.neunet.2025.107599. Epub 2025 May 23.
6
Multilingual event extraction for epidemic detection.用于疫情检测的多语言事件提取
Artif Intell Med. 2015 Oct;65(2):131-43. doi: 10.1016/j.artmed.2015.06.005. Epub 2015 Jul 17.
7
Improving neural machine translation with POS-tag features for low-resource language pairs.利用词性标注特征改进低资源语言对的神经机器翻译。
Heliyon. 2022 Aug 22;8(8):e10375. doi: 10.1016/j.heliyon.2022.e10375. eCollection 2022 Aug.
8
Neural machine translation of clinical text: an empirical investigation into multilingual pre-trained language models and transfer-learning.临床文本的神经机器翻译:对多语言预训练语言模型和迁移学习的实证研究。
Front Digit Health. 2024 Feb 26;6:1211564. doi: 10.3389/fdgth.2024.1211564. eCollection 2024.
9
Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text.结合 MEDLINE 和出版商数据创建平行语料库,用于生物医学文本的自动翻译。
BMC Bioinformatics. 2013 Apr 30;14:146. doi: 10.1186/1471-2105-14-146.
10
Language assessment of Polish-English bilingual children by speech and language therapists who do not speak Polish: A feasibility study of a novel scoring schema for Sentence-Repetition-Tasks.不讲波兰语的言语和语言治疗师对波兰语-英语双语儿童的语言评估:句子重复任务新型评分模式的可行性研究
Int J Lang Commun Disord. 2025 Mar-Apr;60(2):e70005. doi: 10.1111/1460-6984.70005.

引用本文的文献

1
Meta AI creates speech-to-speech translator that works in dozens of languages.元人工智能公司开发出了可用于数十种语言的语音到语音翻译器。
Nature. 2025 Jan;637(8047):771-772. doi: 10.1038/d41586-025-00045-y.

本文引用的文献

1
Racial disparities in automated speech recognition.种族差异与自动化语音识别。
Proc Natl Acad Sci U S A. 2020 Apr 7;117(14):7684-7689. doi: 10.1073/pnas.1915768117. Epub 2020 Mar 23.