健康如钟：一种通过语音声学生物标志物进行健康状态分类的深度学习方法。

Sound as a bell: a deep learning approach for health status classification through speech acoustic biomarkers.

作者信息

Wang Yanbing, Wang Haiyan, Li Zhuoxuan, Zhang Haoran, Yang Liwen, Li Jiarui, Tang Zixiang, Hou Shujuan, Wang Qi

机构信息

School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing, 100029, China.

School of Management, Beijing University of Chinese Medicine, Beijing, 100029, China.

出版信息

Chin Med. 2024 Jul 24;19(1):101. doi: 10.1186/s13020-024-00973-3.

DOI:10.1186/s13020-024-00973-3

PMID:39049005

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11267751/

Abstract

BACKGROUND

Human health is a complex, dynamic concept encompassing a spectrum of states influenced by genetic, environmental, physiological, and psychological factors. Traditional Chinese Medicine categorizes health into nine body constitutional types, each reflecting unique balances or imbalances in vital energies, influencing physical, mental, and emotional states. Advances in machine learning models offer promising avenues for diagnosing conditions like Alzheimer's, dementia, and respiratory diseases by analyzing speech patterns, enabling complementary non-invasive disease diagnosis. The study aims to use speech audio to identify subhealth populations characterized by unbalanced constitution types.

METHODS

Participants, aged 18-45, were selected from the Acoustic Study of Health. Audio recordings were collected using ATR2500X-USB microphones and Praat software. Exclusion criteria included recent illness, dental issues, and specific medical histories. The audio data were preprocessed to Mel-frequency cepstral coefficients (MFCCs) for model training. Three deep learning models-1-Dimensional Convolution Network (Conv1D), 2-Dimensional Convolution Network (Conv2D), and Long Short-Term Memory (LSTM)-were implemented using Python to classify health status. Saliency maps were generated to provide model explainability.

RESULTS

The study used 1,378 recordings from balanced (healthy) and 1,413 from unbalanced (subhealth) types. The Conv1D model achieved a training accuracy of 91.91% and validation accuracy of 84.19%. The Conv2D model had 96.19% training accuracy and 84.93% validation accuracy. The LSTM model showed 92.79% training accuracy and 87.13% validation accuracy, with early signs of overfitting. AUC scores were 0.92 and 0.94 (Conv1D), 0.99 (Conv2D), and 0.97 (LSTM). All models demonstrated robust performance, with Conv2D excelling in discrimination accuracy.

CONCLUSIONS

The deep learning classification of human speech audio for health status using body constitution types showed promising results with Conv1D, Conv2D, and LSTM models. Analysis of ROC curves, training accuracy, and validation accuracy showed all models robustly distinguished between balanced and unbalanced constitution types. Conv2D excelled with good accuracy, while Conv1D and LSTM also performed well, affirming their reliability. The study integrates constitution theory and deep learning technologies to classify subhealth populations using noninvasive approach, thereby promoting personalized medicine and early intervention strategies.

摘要

背景

人类健康是一个复杂、动态的概念，涵盖了受遗传、环境、生理和心理因素影响的一系列状态。中医将健康分为九种体质类型，每种类型反映了人体元气独特的平衡或失衡状态，影响着身体、心理和情绪状态。机器学习模型的进展为通过分析语音模式诊断阿尔茨海默病、痴呆症和呼吸系统疾病等病症提供了有前景的途径，实现了互补性的非侵入性疾病诊断。本研究旨在利用语音音频识别以体质类型不平衡为特征的亚健康人群。

方法

从健康声学研究中选取年龄在18至45岁之间的参与者。使用ATR2500X-USB麦克风和Praat软件收集音频记录。排除标准包括近期患病、牙齿问题和特定病史。音频数据被预处理为梅尔频率倒谱系数（MFCCs）用于模型训练。使用Python实现了三种深度学习模型——一维卷积网络（Conv1D）、二维卷积网络（Conv2D）和长短期记忆网络（LSTM）——来对健康状态进行分类。生成显著性图以提供模型可解释性。

结果

该研究使用了1378份来自平衡（健康）体质类型的记录和1413份来自不平衡（亚健康）体质类型的记录。Conv1D模型的训练准确率达到91.91%，验证准确率为84.19%。Conv2D模型的训练准确率为96.19%，验证准确率为84.93%。LSTM模型的训练准确率为92.79%，验证准确率为87.13%，有过拟合的早期迹象。曲线下面积（AUC）分数分别为0.92和0.94（Conv1D）、0.99（Conv2D）和0.97（LSTM）。所有模型均表现出稳健的性能，Conv2D在判别准确率方面表现出色。

结论

使用体质类型对人类语音音频进行健康状态的深度学习分类，Conv1D、Conv2D和LSTM模型均取得了有前景的结果。对ROC曲线、训练准确率和验证准确率的分析表明，所有模型都能稳健地区分平衡和不平衡体质类型。Conv2D准确率高表现出色，而Conv1D和LSTM也表现良好，证实了它们的可靠性。本研究将体质理论与深度学习技术相结合，采用非侵入性方法对亚健康人群进行分类，从而推动个性化医疗和早期干预策略。