Suppr超能文献

基于胎儿超声视频和超声检查医师音频的双表征学习

Dual Representation Learning From Fetal Ultrasound Video And Sonographer Audio.

作者信息

Gridach Mourad, Alsharid Mohammad, Jiao Jianbo, Drukker Lior, Papageorghiou Aris T, Noble J Alison

机构信息

University of Oxford.

Khalifa University.

出版信息

Proc IEEE Int Symp Biomed Imaging. 2024 May 27;2024:1-4. doi: 10.1109/ISBI56570.2024.10635693.

Abstract

This paper tackles the challenging problem of real-world data self-supervised representation learning from two modalities: fetal ultrasound (US) video and the corresponding speech acquired when a sonographer performs a pregnancy scan. We propose to transfer knowledge between the different modalities, even though the sonographer's speech and the US video may not be semantically correlated. We design a network architecture capable of learning useful representations such as of anatomical features and structures while recognising the correlation between an US video scan and the sonographer's speech. We introduce dual representation learning from US video and audio, which consists of two concepts: Multi-Modal Contrastive Learning and Multi-Modal Similarity Learning, in a latent feature space. Experiments show that the proposed architecture learns powerful representations and transfers well for two downstream tasks. Furthermore, we experiment with two different datasets for pretraining which differ in size and length of video clips (as well as sonographer speech) to show that the quality of the sonographer's speech plays an important role in the final performance.

摘要

本文从两种模态着手解决现实世界数据自监督表示学习这一具有挑战性的问题

胎儿超声(US)视频以及超声检查医师进行妊娠扫描时获取的相应语音。我们建议在不同模态之间传递知识,即便超声检查医师的语音与超声视频在语义上可能并无关联。我们设计了一种网络架构,该架构能够在识别超声视频扫描与超声检查医师语音之间的相关性的同时,学习诸如解剖特征和结构等有用的表示。我们引入了来自超声视频和音频的双重表示学习,它在潜在特征空间中由两个概念组成:多模态对比学习和多模态相似性学习。实验表明,所提出的架构学习到了强大的表示,并且在两个下游任务中具有良好的迁移能力。此外,我们使用两个不同的数据集进行预训练,这两个数据集在视频片段(以及超声检查医师的语音)的大小和长度方面存在差异,以表明超声检查医师语音的质量在最终性能中起着重要作用。

相似文献

2
Self-supervised Contrastive Video-Speech Representation Learning for Ultrasound.用于超声的自监督对比视频-语音表征学习
Med Image Comput Comput Assist Interv. 2020 Oct;12263:534-543. doi: 10.1007/978-3-030-59716-0_51.
3
Audio-visual modelling in a clinical setting.临床环境中的视听建模。
Sci Rep. 2024 Jul 6;14(1):15569. doi: 10.1038/s41598-024-66160-4.
10
Self-Supervised Representation Learning for Ultrasound Video.超声视频的自监督表征学习
Proc IEEE Int Symp Biomed Imaging. 2020 Apr 3;2020:1847-1850. doi: 10.1109/ISBI45749.2020.9098666.

本文引用的文献

1
Self-supervised Contrastive Video-Speech Representation Learning for Ultrasound.用于超声的自监督对比视频-语音表征学习
Med Image Comput Comput Assist Interv. 2020 Oct;12263:534-543. doi: 10.1007/978-3-030-59716-0_51.
2
Deep Audio-Visual Speech Recognition.深度视听语音识别
IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):8717-8727. doi: 10.1109/TPAMI.2018.2889052. Epub 2022 Nov 7.
3
What Do Different Evaluation Metrics Tell Us About Saliency Models?不同的评估指标能告诉我们关于显著性模型的哪些信息?
IEEE Trans Pattern Anal Mach Intell. 2019 Mar;41(3):740-757. doi: 10.1109/TPAMI.2018.2815601. Epub 2018 Mar 13.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验