• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于胎儿超声视频和超声检查医师音频的双表征学习

Dual Representation Learning From Fetal Ultrasound Video And Sonographer Audio.

作者信息

Gridach Mourad, Alsharid Mohammad, Jiao Jianbo, Drukker Lior, Papageorghiou Aris T, Noble J Alison

机构信息

University of Oxford.

Khalifa University.

出版信息

Proc IEEE Int Symp Biomed Imaging. 2024 May 27;2024:1-4. doi: 10.1109/ISBI56570.2024.10635693.

DOI:10.1109/ISBI56570.2024.10635693
PMID:40438701
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7616753/
Abstract

This paper tackles the challenging problem of real-world data self-supervised representation learning from two modalities: fetal ultrasound (US) video and the corresponding speech acquired when a sonographer performs a pregnancy scan. We propose to transfer knowledge between the different modalities, even though the sonographer's speech and the US video may not be semantically correlated. We design a network architecture capable of learning useful representations such as of anatomical features and structures while recognising the correlation between an US video scan and the sonographer's speech. We introduce dual representation learning from US video and audio, which consists of two concepts: Multi-Modal Contrastive Learning and Multi-Modal Similarity Learning, in a latent feature space. Experiments show that the proposed architecture learns powerful representations and transfers well for two downstream tasks. Furthermore, we experiment with two different datasets for pretraining which differ in size and length of video clips (as well as sonographer speech) to show that the quality of the sonographer's speech plays an important role in the final performance.

摘要

本文从两种模态着手解决现实世界数据自监督表示学习这一具有挑战性的问题

胎儿超声(US)视频以及超声检查医师进行妊娠扫描时获取的相应语音。我们建议在不同模态之间传递知识,即便超声检查医师的语音与超声视频在语义上可能并无关联。我们设计了一种网络架构,该架构能够在识别超声视频扫描与超声检查医师语音之间的相关性的同时,学习诸如解剖特征和结构等有用的表示。我们引入了来自超声视频和音频的双重表示学习,它在潜在特征空间中由两个概念组成:多模态对比学习和多模态相似性学习。实验表明,所提出的架构学习到了强大的表示,并且在两个下游任务中具有良好的迁移能力。此外,我们使用两个不同的数据集进行预训练,这两个数据集在视频片段(以及超声检查医师的语音)的大小和长度方面存在差异,以表明超声检查医师语音的质量在最终性能中起着重要作用。

相似文献

1
Dual Representation Learning From Fetal Ultrasound Video And Sonographer Audio.基于胎儿超声视频和超声检查医师音频的双表征学习
Proc IEEE Int Symp Biomed Imaging. 2024 May 27;2024:1-4. doi: 10.1109/ISBI56570.2024.10635693.
2
Self-supervised Contrastive Video-Speech Representation Learning for Ultrasound.用于超声的自监督对比视频-语音表征学习
Med Image Comput Comput Assist Interv. 2020 Oct;12263:534-543. doi: 10.1007/978-3-030-59716-0_51.
3
Audio-visual modelling in a clinical setting.临床环境中的视听建模。
Sci Rep. 2024 Jul 6;14(1):15569. doi: 10.1038/s41598-024-66160-4.
4
Memory-based unsupervised video clinical quality assessment with multi-modality data in fetal ultrasound.基于记忆的无监督视频胎儿超声多模态数据临床质量评估。
Med Image Anal. 2023 Dec;90:102977. doi: 10.1016/j.media.2023.102977. Epub 2023 Sep 23.
5
Gaze-probe joint guidance with multi-task learning in obstetric ultrasound scanning.基于多任务学习的产科超声扫描中注视-探头联合引导
Med Image Anal. 2023 Dec;90:102981. doi: 10.1016/j.media.2023.102981. Epub 2023 Sep 29.
6
Unsupervised Modality-Transferable Video Highlight Detection With Representation Activation Sequence Learning.基于表征激活序列学习的无监督模态可转移视频高光检测
IEEE Trans Image Process. 2024;33:1911-1922. doi: 10.1109/TIP.2024.3372469. Epub 2024 Mar 12.
7
Gaze-assisted automatic captioning of fetal ultrasound videos using three-way multi-modal deep neural networks.使用三向多模态深度神经网络的胎儿超声视频注视辅助自动字幕生成。
Med Image Anal. 2022 Nov;82:102630. doi: 10.1016/j.media.2022.102630. Epub 2022 Sep 17.
8
Knowledge representation and learning of operator clinical workflow from full-length routine fetal ultrasound scan videos.从全长常规胎儿超声扫描视频中获取操作人员临床工作流程的知识表示和学习。
Med Image Anal. 2021 Apr;69:101973. doi: 10.1016/j.media.2021.101973. Epub 2021 Jan 23.
9
Anatomy-Aware Contrastive Representation Learning for Fetal Ultrasound.用于胎儿超声的解剖学感知对比表示学习
Comput Vis ECCV. 2022 Oct;2022:422-436. doi: 10.1007/978-3-031-25066-8_23.
10
Self-Supervised Representation Learning for Ultrasound Video.超声视频的自监督表征学习
Proc IEEE Int Symp Biomed Imaging. 2020 Apr 3;2020:1847-1850. doi: 10.1109/ISBI45749.2020.9098666.

本文引用的文献

1
Self-supervised Contrastive Video-Speech Representation Learning for Ultrasound.用于超声的自监督对比视频-语音表征学习
Med Image Comput Comput Assist Interv. 2020 Oct;12263:534-543. doi: 10.1007/978-3-030-59716-0_51.
2
Deep Audio-Visual Speech Recognition.深度视听语音识别
IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):8717-8727. doi: 10.1109/TPAMI.2018.2889052. Epub 2022 Nov 7.
3
What Do Different Evaluation Metrics Tell Us About Saliency Models?不同的评估指标能告诉我们关于显著性模型的哪些信息?
IEEE Trans Pattern Anal Mach Intell. 2019 Mar;41(3):740-757. doi: 10.1109/TPAMI.2018.2815601. Epub 2018 Mar 13.