• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

超越边缘:使用 DeepLabCut 从超声和相机图像中对语音发音器官进行无标记姿态估计。

Beyond the Edge: Markerless Pose Estimation of Speech Articulators from Ultrasound and Camera Images Using DeepLabCut.

机构信息

Clinical Audiology, Speech and Language Research Centre, Queen Margaret University, Musselburgh EH21 6UU, UK.

Articulate Instruments Ltd., Musselburgh EH21 6UU, UK.

出版信息

Sensors (Basel). 2022 Feb 2;22(3):1133. doi: 10.3390/s22031133.

DOI:10.3390/s22031133
PMID:35161879
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8838804/
Abstract

Automatic feature extraction from images of speech articulators is currently achieved by detecting edges. Here, we investigate the use of pose estimation deep neural nets with transfer learning to perform markerless estimation of speech articulator keypoints using only a few hundred hand-labelled images as training input. Midsagittal ultrasound images of the tongue, jaw, and hyoid and camera images of the lips were hand-labelled with keypoints, trained using DeepLabCut and evaluated on unseen speakers and systems. Tongue surface contours interpolated from estimated and hand-labelled keypoints produced an average mean sum of distances (MSD) of 0.93, s.d. 0.46 mm, compared with 0.96, s.d. 0.39 mm, for two human labellers, and 2.3, s.d. 1.5 mm, for the best performing edge detection algorithm. A pilot set of simultaneous electromagnetic articulography (EMA) and ultrasound recordings demonstrated partial correlation among three physical sensor positions and the corresponding estimated keypoints and requires further investigation. The accuracy of the estimating lip aperture from a camera video was high, with a mean MSD of 0.70, s.d. 0.56 mm compared with 0.57, s.d. 0.48 mm for two human labellers. DeepLabCut was found to be a fast, accurate and fully automatic method of providing unique kinematic data for tongue, hyoid, jaw, and lips.

摘要

目前,从言语构音器官的图像中自动提取特征是通过检测边缘来实现的。在这里,我们研究了使用姿势估计深度神经网络和迁移学习来执行无标记的言语构音器官关键点估计,仅使用几百张手动标记的图像作为训练输入。对舌、颌和舌骨的中矢状面超声图像以及唇的相机图像进行了手动标记,使用 DeepLabCut 进行训练,并在看不见的说话者和系统上进行了评估。从估计和手动标记的关键点插值得到的舌面轮廓产生了平均均方距离(MSD)为 0.93,标准差为 0.46 毫米,而两名人类标记者的平均均方距离为 0.96,标准差为 0.39 毫米,而表现最好的边缘检测算法的平均均方距离为 2.3,标准差为 1.5 毫米。一组同时进行的电磁口动描记术(EMA)和超声记录的初步结果表明,三个物理传感器位置与相应的估计关键点之间存在部分相关性,需要进一步研究。从相机视频中估计唇开口的准确性较高,平均 MSD 为 0.70,标准差为 0.56 毫米,而两名人类标记者的平均 MSD 为 0.57,标准差为 0.48 毫米。DeepLabCut 被发现是一种快速、准确和全自动的方法,可以为舌、舌骨、颌和唇提供独特的运动学数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/4169a6fa749e/sensors-22-01133-g021.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/9f49a8168f50/sensors-22-01133-g0A1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/2c1be45e1524/sensors-22-01133-g0A2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/4a4615ff5385/sensors-22-01133-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/1f6706c1b037/sensors-22-01133-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/4a953121ffee/sensors-22-01133-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/dfb32d6b1d30/sensors-22-01133-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/365064b1d2b5/sensors-22-01133-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/f71b094683b1/sensors-22-01133-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/4696a4ea1158/sensors-22-01133-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/9be6a793a368/sensors-22-01133-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/7aace92f3f92/sensors-22-01133-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/fc1ec130053e/sensors-22-01133-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/e585695ed164/sensors-22-01133-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/2e8b93643c54/sensors-22-01133-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/ee258ead3a95/sensors-22-01133-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/c24b295e2b96/sensors-22-01133-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/f401d0e47795/sensors-22-01133-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/260e19e41ea4/sensors-22-01133-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/843d2bfa64f2/sensors-22-01133-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/7bc6e2a1eb77/sensors-22-01133-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/52c437964d2e/sensors-22-01133-g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/ea8d8534c46a/sensors-22-01133-g020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/4169a6fa749e/sensors-22-01133-g021.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/9f49a8168f50/sensors-22-01133-g0A1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/2c1be45e1524/sensors-22-01133-g0A2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/4a4615ff5385/sensors-22-01133-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/1f6706c1b037/sensors-22-01133-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/4a953121ffee/sensors-22-01133-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/dfb32d6b1d30/sensors-22-01133-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/365064b1d2b5/sensors-22-01133-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/f71b094683b1/sensors-22-01133-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/4696a4ea1158/sensors-22-01133-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/9be6a793a368/sensors-22-01133-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/7aace92f3f92/sensors-22-01133-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/fc1ec130053e/sensors-22-01133-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/e585695ed164/sensors-22-01133-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/2e8b93643c54/sensors-22-01133-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/ee258ead3a95/sensors-22-01133-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/c24b295e2b96/sensors-22-01133-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/f401d0e47795/sensors-22-01133-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/260e19e41ea4/sensors-22-01133-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/843d2bfa64f2/sensors-22-01133-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/7bc6e2a1eb77/sensors-22-01133-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/52c437964d2e/sensors-22-01133-g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/ea8d8534c46a/sensors-22-01133-g020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d31a/8838804/4169a6fa749e/sensors-22-01133-g021.jpg

相似文献

1
Beyond the Edge: Markerless Pose Estimation of Speech Articulators from Ultrasound and Camera Images Using DeepLabCut.超越边缘:使用 DeepLabCut 从超声和相机图像中对语音发音器官进行无标记姿态估计。
Sensors (Basel). 2022 Feb 2;22(3):1133. doi: 10.3390/s22031133.
2
Human Sensorimotor Cortex Control of Directly Measured Vocal Tract Movements during Vowel Production.人类感觉运动皮层对元音产生期间直接测量的声道运动的控制。
J Neurosci. 2018 Mar 21;38(12):2955-2966. doi: 10.1523/JNEUROSCI.2382-17.2018. Epub 2018 Feb 8.
3
Automatic segmentation of vocal tract articulators in real-time magnetic resonance imaging.实时磁共振成像中的声道发音器官自动分割。
Comput Methods Programs Biomed. 2024 Jan;243:107907. doi: 10.1016/j.cmpb.2023.107907. Epub 2023 Nov 10.
4
A comparison of methods for decoupling tongue and lower lip from jaw movements in 3D articulography.三维运动描记术中舌和下唇与下颌运动解耦方法的比较。
J Speech Lang Hear Res. 2013 Oct;56(5):1503-16. doi: 10.1044/1092-4388(2013/12-0016). Epub 2013 Jul 9.
5
High-Resolution, Non-Invasive Imaging of Upper Vocal Tract Articulators Compatible with Human Brain Recordings.与人类脑记录兼容的上声道发音器官的高分辨率无创成像。
PLoS One. 2016 Mar 28;11(3):e0151327. doi: 10.1371/journal.pone.0151327. eCollection 2016.
6
Linear degrees of freedom in speech production: analysis of cineradio- and labio-film data and articulatory-acoustic modeling.言语产生中的线性自由度:电影放射成像和唇动影片数据的分析以及发音-声学建模
J Acoust Soc Am. 2001 May;109(5 Pt 1):2165-80. doi: 10.1121/1.1361090.
7
Articulatory Range of Movement in Individuals With Dysarthria Secondary to Amyotrophic Lateral Sclerosis.肌萎缩侧索硬化症继发构音障碍患者的运动范围。
Am J Speech Lang Pathol. 2018 Aug 6;27(3):996-1009. doi: 10.1044/2018_AJSLP-17-0064.
8
Video-Based Pose Estimation for Gait Analysis in Stroke Survivors during Clinical Assessments: A Proof-of-Concept Study.临床评估中用于中风幸存者步态分析的基于视频的姿势估计:一项概念验证研究。
Digit Biomark. 2022 Jan 13;6(1):9-18. doi: 10.1159/000520732. eCollection 2022.
9
An investigation of interference between electromagnetic articulography and electroglottography.电磁发音图与声门图之间干扰的研究。
JASA Express Lett. 2022 Sep;2(9):095204. doi: 10.1121/10.0014033.
10
Interarticulator Speech Coordination: Timing Is of the Essence.协同发音的言语:时机是关键。
J Speech Lang Hear Res. 2023 Mar 7;66(3):901-915. doi: 10.1044/2022_JSLHR-22-00594. Epub 2023 Feb 24.

引用本文的文献

1
A Lingual Ultrasound Study of Speech in Patients With Cleft Lip and Palate Following Orthognathic Surgery.唇腭裂患者正颌手术后言语的舌部超声研究
Orthod Craniofac Res. 2025 Apr 11. doi: 10.1111/ocr.12926.
2
Dimensionality Reduction in Lingual Articulation of Vowels: Evidence From Lax Vowels in Northern Anglo-English.元音舌头发音中的降维:来自北英格兰英语中松元音的证据。
Lang Speech. 2025 Sep;68(3):689-721. doi: 10.1177/00238309251320581. Epub 2025 Mar 25.
3
3D markerless tracking of speech movements with submillimeter accuracy.

本文引用的文献

1
Distance vs time. Acoustic and articulatory consequences of reduced vowel duration in Polish.距离与时间。波兰语中元音时长缩短的声学和发音后果。
J Acoust Soc Am. 2021 Jul;150(1):592. doi: 10.1121/10.0005585.
2
A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images.多说话人原始和重建语音产生实时 MRI 视频及 3D 容积图像数据集。
Sci Data. 2021 Jul 20;8(1):187. doi: 10.1038/s41597-021-00976-x.
3
Automatic vocal tract landmark localization from midsagittal MRI data.基于磁共振中矢状面图像的自动声道地标定位
以亚毫米精度进行语音动作的无标记三维跟踪。
bioRxiv. 2025 Feb 16:2025.02.13.638009. doi: 10.1101/2025.02.13.638009.
4
DeepLabCut custom-trained model and the refinement function for gait analysis.用于步态分析的深度实验室切割定制训练模型及优化功能。
Sci Rep. 2025 Jan 18;15(1):2364. doi: 10.1038/s41598-025-85591-1.
5
A comparison of point-tracking algorithms in ultrasound videos from the upper limb.上肢超声视频中基于点的追踪算法比较。
Biomed Eng Online. 2023 May 24;22(1):52. doi: 10.1186/s12938-023-01105-y.
6
An initial framework for use of ultrasound by speech and language therapists in the UK: Scope of practice, education and governance.英国言语和语言治疗师使用超声的初步框架:实践范围、教育与管理。
Ultrasound. 2023 May;31(2):92-103. doi: 10.1177/1742271X221122562. Epub 2022 Oct 12.
7
A systematic review of the applications of markerless motion capture (MMC) technology for clinical measurement in rehabilitation.基于无标记运动捕捉(MMC)技术的临床康复测量应用的系统综述。
J Neuroeng Rehabil. 2023 May 2;20(1):57. doi: 10.1186/s12984-023-01186-9.
8
An Ultrasound Investigation of Tongue Dorsum Raising in Children with Cleft Palate +/- Cleft Lip.腭裂/唇裂患儿的舌背抬高超声研究。
Cleft Palate Craniofac J. 2024 Jul;61(7):1104-1115. doi: 10.1177/10556656231158965. Epub 2023 Feb 27.
9
Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping.优化基于残差网络的发音-声学映射的超声舌图像表示。
Sensors (Basel). 2022 Nov 8;22(22):8601. doi: 10.3390/s22228601.
10
Machine-learning-based video analysis of grasping behavior during recovery from cervical spinal cord injury.基于机器学习的颈椎损伤后恢复过程中抓握行为的视频分析。
Behav Brain Res. 2023 Apr 12;443:114150. doi: 10.1016/j.bbr.2022.114150. Epub 2022 Oct 7.
Sci Rep. 2020 Jan 30;10(1):1468. doi: 10.1038/s41598-020-58103-6.
4
Deep learning tools for the measurement of animal behavior in neuroscience.深度学习工具在神经科学中用于测量动物行为。
Curr Opin Neurobiol. 2020 Feb;60:1-11. doi: 10.1016/j.conb.2019.10.008. Epub 2019 Nov 29.
5
DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning.DeepPoseKit,一个使用深度学习进行快速、鲁棒的动物姿态估计的软件工具包。
Elife. 2019 Oct 1;8:e47994. doi: 10.7554/eLife.47994.
6
Using DeepLabCut for 3D markerless pose estimation across species and behaviors.使用 DeepLabCut 进行跨物种和行为的无标记 3D 姿态估计。
Nat Protoc. 2019 Jul;14(7):2152-2176. doi: 10.1038/s41596-019-0176-0. Epub 2019 Jun 21.
7
DeepLabCut: markerless pose estimation of user-defined body parts with deep learning.DeepLabCut:基于深度学习的用户自定义身体部位无标记姿态估计。
Nat Neurosci. 2018 Sep;21(9):1281-1289. doi: 10.1038/s41593-018-0209-y. Epub 2018 Aug 20.
8
Multi-hypothesis tracking of the tongue surface in ultrasound video recordings of normal and impaired speech.正常和受损语音超声视频记录中舌面的多假设跟踪。
Med Image Anal. 2018 Feb;44:98-114. doi: 10.1016/j.media.2017.12.003. Epub 2017 Dec 5.
9
A comparative study on the contour tracking algorithms in ultrasound tongue images with automatic re-initialization.基于自动重新初始化的超声舌图像轮廓跟踪算法的比较研究。
J Acoust Soc Am. 2016 May;139(5):EL154. doi: 10.1121/1.4951024.
10
Tongue contour tracking in dynamic ultrasound via higher-order MRFs and efficient fusion moves.基于高阶马尔可夫随机场和高效融合策略的动态超声舌轮廓跟踪。
Med Image Anal. 2012 Dec;16(8):1503-20. doi: 10.1016/j.media.2012.07.001. Epub 2012 Aug 1.