超越边缘：使用 DeepLabCut 从超声和相机图像中对语音发音器官进行无标记姿态估计。

Beyond the Edge: Markerless Pose Estimation of Speech Articulators from Ultrasound and Camera Images Using DeepLabCut.

机构信息

Clinical Audiology, Speech and Language Research Centre, Queen Margaret University, Musselburgh EH21 6UU, UK.

Articulate Instruments Ltd., Musselburgh EH21 6UU, UK.

出版信息

Sensors (Basel). 2022 Feb 2;22(3):1133. doi: 10.3390/s22031133.

DOI:10.3390/s22031133

PMID:35161879

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8838804/

Abstract

Automatic feature extraction from images of speech articulators is currently achieved by detecting edges. Here, we investigate the use of pose estimation deep neural nets with transfer learning to perform markerless estimation of speech articulator keypoints using only a few hundred hand-labelled images as training input. Midsagittal ultrasound images of the tongue, jaw, and hyoid and camera images of the lips were hand-labelled with keypoints, trained using DeepLabCut and evaluated on unseen speakers and systems. Tongue surface contours interpolated from estimated and hand-labelled keypoints produced an average mean sum of distances (MSD) of 0.93, s.d. 0.46 mm, compared with 0.96, s.d. 0.39 mm, for two human labellers, and 2.3, s.d. 1.5 mm, for the best performing edge detection algorithm. A pilot set of simultaneous electromagnetic articulography (EMA) and ultrasound recordings demonstrated partial correlation among three physical sensor positions and the corresponding estimated keypoints and requires further investigation. The accuracy of the estimating lip aperture from a camera video was high, with a mean MSD of 0.70, s.d. 0.56 mm compared with 0.57, s.d. 0.48 mm for two human labellers. DeepLabCut was found to be a fast, accurate and fully automatic method of providing unique kinematic data for tongue, hyoid, jaw, and lips.

摘要

目前，从言语构音器官的图像中自动提取特征是通过检测边缘来实现的。在这里，我们研究了使用姿势估计深度神经网络和迁移学习来执行无标记的言语构音器官关键点估计，仅使用几百张手动标记的图像作为训练输入。对舌、颌和舌骨的中矢状面超声图像以及唇的相机图像进行了手动标记，使用 DeepLabCut 进行训练，并在看不见的说话者和系统上进行了评估。从估计和手动标记的关键点插值得到的舌面轮廓产生了平均均方距离（MSD）为 0.93，标准差为 0.46 毫米，而两名人类标记者的平均均方距离为 0.96，标准差为 0.39 毫米，而表现最好的边缘检测算法的平均均方距离为 2.3，标准差为 1.5 毫米。一组同时进行的电磁口动描记术（EMA）和超声记录的初步结果表明，三个物理传感器位置与相应的估计关键点之间存在部分相关性，需要进一步研究。从相机视频中估计唇开口的准确性较高，平均 MSD 为 0.70，标准差为 0.56 毫米，而两名人类标记者的平均 MSD 为 0.57，标准差为 0.48 毫米。DeepLabCut 被发现是一种快速、准确和全自动的方法，可以为舌、舌骨、颌和唇提供独特的运动学数据。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

超越边缘：使用 DeepLabCut 从超声和相机图像中对语音发音器官进行无标记姿态估计。

Beyond the Edge: Markerless Pose Estimation of Speech Articulators from Ultrasound and Camera Images Using DeepLabCut.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

超越边缘：使用 DeepLabCut 从超声和相机图像中对语音发音器官进行无标记姿态估计。

Beyond the Edge: Markerless Pose Estimation of Speech Articulators from Ultrasound and Camera Images Using DeepLabCut.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献