Suppr超能文献

音频数据压缩对嗓音生物标志物检测特征提取的影响:验证研究

Impact of Audio Data Compression on Feature Extraction for Vocal Biomarker Detection: Validation Study.

作者信息

Oreskovic Jessica, Kaufman Jaycee, Fossat Yan

机构信息

Klick Labs, Toronto, ON, Canada.

出版信息

JMIR Biomed Eng. 2024 Apr 15;9:e56246. doi: 10.2196/56246.

Abstract

BACKGROUND

Vocal biomarkers, derived from acoustic analysis of vocal characteristics, offer noninvasive avenues for medical screening, diagnostics, and monitoring. Previous research demonstrated the feasibility of predicting type 2 diabetes mellitus through acoustic analysis of smartphone-recorded speech. Building upon this work, this study explores the impact of audio data compression on acoustic vocal biomarker development, which is critical for broader applicability in health care.

OBJECTIVE

The objective of this research is to analyze how common audio compression algorithms (MP3, M4A, and WMA) applied by 3 different conversion tools at 2 bitrates affect features crucial for vocal biomarker detection.

METHODS

The impact of audio data compression on acoustic vocal biomarker development was investigated using uncompressed voice samples converted into MP3, M4A, and WMA formats at 2 bitrates (320 and 128 kbps) with MediaHuman (MH) Audio Converter, WonderShare (WS) UniConverter, and Fast Forward Moving Picture Experts Group (FFmpeg). The data set comprised recordings from 505 participants, totaling 17,298 audio files, collected using a smartphone. Participants recorded a fixed English sentence up to 6 times daily for up to 14 days. Feature extraction, including pitch, jitter, intensity, and Mel-frequency cepstral coefficients (MFCCs), was conducted using Python and Parselmouth. The Wilcoxon signed rank test and the Bonferroni correction for multiple comparisons were used for statistical analysis.

RESULTS

In this study, 36,970 audio files were initially recorded from 505 participants, with 17,298 recordings meeting the fixed sentence criteria after screening. Differences between the audio conversion software, MH, WS, and FFmpeg, were notable, impacting compression outcomes such as constant or variable bitrates. Analysis encompassed diverse data compression formats and a wide array of voice features and MFCCs. Wilcoxon signed rank tests yielded P values, with those below the Bonferroni-corrected significance level indicating significant alterations due to compression. The results indicated feature-specific impacts of compression across formats and bitrates. MH-converted files exhibited greater resilience compared to WS-converted files. Bitrate also influenced feature stability, with 38 cases affected uniquely by a single bitrate. Notably, voice features showed greater stability than MFCCs across conversion methods.

CONCLUSIONS

Compression effects were found to be feature specific, with MH and FFmpeg showing greater resilience. Some features were consistently affected, emphasizing the importance of understanding feature resilience for diagnostic applications. Considering the implementation of vocal biomarkers in health care, finding features that remain consistent through compression for data storage or transmission purposes is valuable. Focused on specific features and formats, future research could broaden the scope to include diverse features, real-time compression algorithms, and various recording methods. This study enhances our understanding of audio compression's influence on voice features and MFCCs, providing insights for developing applications across fields. The research underscores the significance of feature stability in working with compressed audio data, laying a foundation for informed voice data use in evolving technological landscapes.

摘要

背景

从嗓音特征的声学分析中得出的嗓音生物标志物为医学筛查、诊断和监测提供了非侵入性途径。先前的研究证明了通过对智能手机录制的语音进行声学分析来预测2型糖尿病的可行性。基于这项工作,本研究探讨了音频数据压缩对声学嗓音生物标志物开发的影响,这对于在医疗保健中更广泛的应用至关重要。

目的

本研究的目的是分析3种不同转换工具在2种比特率下应用的常见音频压缩算法(MP3、M4A和WMA)如何影响对嗓音生物标志物检测至关重要的特征。

方法

使用MediaHuman(MH)音频转换器、万兴(WS)优转和快速前进运动图像专家组(FFmpeg),将未压缩的语音样本转换为2种比特率(320和128 kbps)的MP3、M4A和WMA格式,研究音频数据压缩对声学嗓音生物标志物开发的影响。数据集包括使用智能手机收集的505名参与者的录音,总共17298个音频文件。参与者每天最多录制6次固定的英语句子,最多持续14天。使用Python和Parselmouth进行特征提取,包括音高、抖动、强度和梅尔频率倒谱系数(MFCC)。采用威尔科克森符号秩检验和用于多重比较的邦费罗尼校正进行统计分析。

结果

在本研究中,最初从505名参与者那里录制了36970个音频文件,经过筛选后,有17298个录音符合固定句子标准。音频转换软件MH、WS和FFmpeg之间的差异显著,影响了诸如恒定或可变比特率等压缩结果。分析涵盖了多种数据压缩格式以及广泛的语音特征和MFCC。威尔科克森符号秩检验产生了P值,那些低于邦费罗尼校正显著性水平的P值表明由于压缩而有显著变化。结果表明了压缩在不同格式和比特率下对特定特征的影响。与WS转换的文件相比,MH转换的文件表现出更大的弹性。比特率也影响特征稳定性,有38个案例仅受单一比特率的独特影响。值得注意的是,在所有转换方法中,语音特征比MFCC表现出更大的稳定性。

结论

发现压缩效果具有特征特异性,MH和FFmpeg表现出更大的弹性。一些特征受到持续影响,强调了了解特征弹性对于诊断应用的重要性。考虑到在医疗保健中实施嗓音生物标志物,找到在压缩后仍保持一致的用于数据存储或传输目的的特征是有价值的。着眼于特定特征和格式,未来的研究可以扩大范围,包括多种特征、实时压缩算法和各种录制方法。本研究增强了我们对音频压缩对语音特征和MFCC影响的理解,为跨领域开发应用提供了见解。该研究强调了在处理压缩音频数据时特征稳定性的重要性,为在不断发展的技术环境中明智地使用语音数据奠定了基础。

相似文献

2
Deep learning in automatic detection of dysphonia: Comparing acoustic features and developing a generalizable framework.
Int J Lang Commun Disord. 2023 Mar;58(2):279-294. doi: 10.1111/1460-6984.12783. Epub 2022 Sep 18.
3
Microphone and Audio Compression Effects on Acoustic Voice Analysis: A Pilot Study.
J Voice. 2023 Mar;37(2):162-172. doi: 10.1016/j.jvoice.2020.12.005. Epub 2021 Jan 13.
4
Acoustic analysis of pathological voices compressed with MPEG system.
J Voice. 2003 Jun;17(2):126-39. doi: 10.1016/s0892-1997(03)00007-9.
5
The effect of MPEG audio compression on multidimensional set of voice parameters.
Logoped Phoniatr Vocol. 2001;26(3):124-38. doi: 10.1080/14015430152728034.
6
Optimizing Voice Recognition Informatic Robots for Effective Communication in Outpatient Settings.
Cureus. 2023 Sep 7;15(9):e44848. doi: 10.7759/cureus.44848. eCollection 2023 Sep.
7
Effects of the Voice over Internet Protocol on perturbation analysis of normal and pathological phonation.
Folia Phoniatr Logop. 2010;62(6):288-96. doi: 10.1159/000285807. Epub 2010 Jun 28.
9
Authenticity examination of compressed audio recordings using detection of multiple compression and encoders' identification.
Forensic Sci Int. 2014 May;238:33-46. doi: 10.1016/j.forsciint.2014.02.008. Epub 2014 Feb 18.

本文引用的文献

1
Acoustic compression in Zoom audio does not compromise voice recognition performance.
Sci Rep. 2023 Oct 31;13(1):18742. doi: 10.1038/s41598-023-45971-x.
2
Guess What We Can Hear-Novel Voice Biomarkers for the Remote Detection of Disease.
Mayo Clin Proc. 2023 Sep;98(9):1353-1375. doi: 10.1016/j.mayocp.2023.03.007. Epub 2023 Mar 28.
4
Voice for Health: The Use of Vocal Biomarkers from Research to Clinical Practice.
Digit Biomark. 2021 Apr 16;5(1):78-88. doi: 10.1159/000515346. eCollection 2021 Jan-Apr.
5
Noninvasive Vocal Biomarker is Associated With Severe Acute Respiratory Syndrome Coronavirus 2 Infection.
Mayo Clin Proc Innov Qual Outcomes. 2021 Jun;5(3):654-662. doi: 10.1016/j.mayocpiqo.2021.05.007. Epub 2021 May 14.
6
Microphone and Audio Compression Effects on Acoustic Voice Analysis: A Pilot Study.
J Voice. 2023 Mar;37(2):162-172. doi: 10.1016/j.jvoice.2020.12.005. Epub 2021 Jan 13.
7
Non-invasive vocal biomarker is associated with pulmonary hypertension.
PLoS One. 2020 Apr 16;15(4):e0231441. doi: 10.1371/journal.pone.0231441. eCollection 2020.
8
Mechanics of human voice production and control.
J Acoust Soc Am. 2016 Oct;140(4):2614. doi: 10.1121/1.4964509.
9
Adaptive Multi-Rate Compression Effects on Vowel Analysis.
Front Bioeng Biotechnol. 2015 Aug 20;3:118. doi: 10.3389/fbioe.2015.00118. eCollection 2015.
10
Effects of audio compression in automatic detection of voice pathologies.
IEEE Trans Biomed Eng. 2008 Dec;55(12):2831-5. doi: 10.1109/TBME.2008.923769.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验