• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于说话人识别和语音阅读的唇部运动特征判别分析。

Discriminative analysis of lip motion features for speaker identification and speech-reading.

作者信息

Cetingül H Ertan, Yemez Yücel, Erzin Engin, Tekalp A Murat

机构信息

Multimedia, Vision and Graphics Laboratory, College of Engineering, Koç University, Sariyer, Istanbul, Turkey.

出版信息

IEEE Trans Image Process. 2006 Oct;15(10):2879-91. doi: 10.1109/tip.2006.877528.

DOI:10.1109/tip.2006.877528
PMID:17022256
Abstract

There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.

摘要

已有多项研究联合使用音频、唇部强度和唇部几何信息用于说话人识别和语音阅读应用。本文提出在统一的特征选择和判别分析框架内,使用明确的唇部运动信息,而非唇部强度和/或几何信息,或作为其补充,用于说话人识别和语音阅读,并解决两个重要问题:1)使用明确的唇部运动信息是否有用,以及2)如果有用,对于这两种应用而言,最佳的唇部运动特征是什么?对于说话人识别,最佳的唇部运动特征被认为是那些能在人群中对个体说话人产生最高区分度的特征,而对于语音阅读,最佳特征是那些能提供最高音素/单词/短语识别率的特征。已考虑了多个唇部运动特征候选,包括围绕唇部的边界框内的密集运动特征、唇部轮廓运动特征,以及这些特征与唇部形状特征的组合。此外,还引入了一种新颖的两阶段、空间和时间判别分析,以选择用于说话人识别和语音阅读应用的最佳唇部运动特征。使用基于隐马尔可夫模型的识别系统的实验结果表明,使用明确的唇部运动信息在这两种应用中均能带来额外的性能提升,并且在语音阅读应用中,唇部运动特征被证明更具价值。

相似文献

1
Discriminative analysis of lip motion features for speaker identification and speech-reading.用于说话人识别和语音阅读的唇部运动特征判别分析。
IEEE Trans Image Process. 2006 Oct;15(10):2879-91. doi: 10.1109/tip.2006.877528.
2
Hybrid simulated annealing and its application to optimization of hidden Markov models for visual speech recognition.混合模拟退火算法及其在视觉语音识别中隐马尔可夫模型优化的应用。
IEEE Trans Syst Man Cybern B Cybern. 2010 Aug;40(4):1188-96. doi: 10.1109/TSMCB.2009.2036753. Epub 2010 Jan 8.
3
Multistream articulatory feature-based models for visual speech recognition.用于视觉语音识别的基于多流发音特征的模型。
IEEE Trans Pattern Anal Mach Intell. 2009 Sep;31(9):1700-7. doi: 10.1109/TPAMI.2008.303.
4
Performance enhancement for audio-visual speaker identification using dynamic facial muscle model.使用动态面部肌肉模型提高视听说话人识别性能
Med Biol Eng Comput. 2006 Oct;44(10):919-30. doi: 10.1007/s11517-006-0106-5. Epub 2006 Sep 26.
5
Lip segmentation under MAP-MRF framework with automatic selection of local observation scale and number of segments.基于 MAP-MRF 框架的唇部分割,具有自动选择局部观测尺度和段数的功能。
IEEE Trans Image Process. 2014 Aug;23(8):3397-411. doi: 10.1109/TIP.2014.2331137. Epub 2014 Jun 17.
6
Lip image segmentation using fuzzy clustering incorporating an elliptic shape function.结合椭圆形状函数的模糊聚类用于唇部图像分割
IEEE Trans Image Process. 2004 Jan;13(1):51-62. doi: 10.1109/tip.2003.818116.
7
Fast free-vibration modal analysis of 2-D physics-based deformable objects.基于二维物理的可变形物体的快速自由振动模态分析
IEEE Trans Image Process. 2005 Mar;14(3):281-93. doi: 10.1109/tip.2004.838693.
8
Face description with local binary patterns: application to face recognition.基于局部二值模式的面部描述:在人脸识别中的应用。
IEEE Trans Pattern Anal Mach Intell. 2006 Dec;28(12):2037-41. doi: 10.1109/TPAMI.2006.244.
9
Feature extraction using recursive cluster-based linear discriminant with application to face recognition.基于递归聚类的线性判别式特征提取及其在人脸识别中的应用。
IEEE Trans Image Process. 2006 Dec;15(12):3824-32. doi: 10.1109/tip.2006.884932.
10
Personal recognition using hand shape and texture.利用手部形状和纹理进行个人识别。
IEEE Trans Image Process. 2006 Aug;15(8):2454-61. doi: 10.1109/tip.2006.875214.

引用本文的文献

1
Privacy-Preserving Sensor-Based Continuous Authentication and User Profiling: A Review.基于隐私保护的传感器连续认证和用户画像:综述。
Sensors (Basel). 2020 Dec 25;21(1):92. doi: 10.3390/s21010092.