De Silva Upeka, Madanian Samaneh, Olsen Sharon, Templeton John Michael, Poellabauer Christian, Schneider Sandra L, Narayanan Ajit, Rubaiat Rahmina
Department of Computer Science and Software Engineering, Auckland University of Technology, Auckland, New Zealand.
Rehabilitation Innovation Centre, Auckland University of Technology, Auckland, New Zealand.
J Med Internet Res. 2025 Jan 13;27:e63004. doi: 10.2196/63004.
Digital biomarkers are increasingly used in clinical decision support for various health conditions. Speech features as digital biomarkers can offer insights into underlying physiological processes due to the complexity of speech production. This process involves respiration, phonation, articulation, and resonance, all of which rely on specific motor systems for the preparation and execution of speech. Deficits in any of these systems can cause changes in speech signal patterns. Increasing efforts are being made to develop speech-based clinical decision support systems.
This systematic scoping review investigated the technological revolution and recent digital clinical speech signal analysis trends to understand the key concepts and research processes from clinical and technical perspectives.
A systematic scoping review was undertaken in 6 databases guided by a set of research questions. Articles that focused on speech signal analysis for clinical decision-making were identified, and the included studies were analyzed quantitatively. A narrower scope of studies investigating neurological diseases were analyzed using qualitative content analysis.
A total of 389 articles met the initial eligibility criteria, of which 72 (18.5%) that focused on neurological diseases were included in the qualitative analysis. In the included studies, Parkinson disease, Alzheimer disease, and cognitive disorders were the most frequently investigated conditions. The literature explored the potential of speech feature analysis in diagnosis, differentiating between, assessing the severity and monitoring the treatment of neurological conditions. The common speech tasks used were sustained phonations, diadochokinetic tasks, reading tasks, activity-based tasks, picture descriptions, and prompted speech tasks. From these tasks, conventional speech features (such as fundamental frequency, jitter, and shimmer), advanced digital signal processing-based speech features (such as wavelet transformation-based features), and spectrograms in the form of audio images were analyzed. Traditional machine learning and deep learning approaches were used to build predictive models, whereas statistical analysis assessed variable relationships and reliability of speech features. Model evaluations primarily focused on analytical validations. A significant research gap was identified: the need for a structured research process to guide studies toward potential technological intervention in clinical settings. To address this, a research framework was proposed that adapts a design science research methodology to guide research studies systematically.
The findings highlight how data science techniques can enhance speech signal analysis to support clinical decision-making. By combining knowledge from clinical practice, speech science, and data science within a structured research framework, future research may achieve greater clinical relevance.
数字生物标志物在各种健康状况的临床决策支持中越来越常用。语音特征作为数字生物标志物,由于语音产生的复杂性,可以提供对潜在生理过程的见解。这个过程涉及呼吸、发声、发音和共鸣,所有这些都依赖于特定的运动系统来准备和执行语音。这些系统中任何一个出现缺陷都可能导致语音信号模式的变化。人们正在加大力度开发基于语音的临床决策支持系统。
本系统综述研究了技术革命和近期数字临床语音信号分析趋势,从临床和技术角度理解关键概念和研究过程。
在一组研究问题的指导下,对6个数据库进行了系统综述。确定了专注于语音信号分析用于临床决策的文章,并对纳入的研究进行了定量分析。使用定性内容分析法对研究范围较窄的神经系统疾病研究进行了分析。
共有389篇文章符合初始纳入标准,其中72篇(18.5%)专注于神经系统疾病的文章被纳入定性分析。在纳入的研究中,帕金森病、阿尔茨海默病和认知障碍是最常研究的病症。文献探讨了语音特征分析在神经系统疾病诊断、鉴别、严重程度评估和治疗监测中的潜力。常用的语音任务包括持续发声、连续运动言语任务、阅读任务、基于活动的任务、图片描述和提示语音任务。从这些任务中,分析了传统语音特征(如基频、抖动和闪烁)、基于先进数字信号处理的语音特征(如基于小波变换的特征)以及音频图像形式的频谱图。使用传统机器学习和深度学习方法构建预测模型,而统计分析评估语音特征的变量关系和可靠性。模型评估主要集中在分析验证上。确定了一个重大的研究差距:需要一个结构化的研究过程来指导研究朝着临床环境中的潜在技术干预方向发展。为了解决这个问题,提出了一个研究框架,该框架采用设计科学研究方法来系统地指导研究。
研究结果突出了数据科学技术如何增强语音信号分析以支持临床决策。通过在结构化研究框架内结合临床实践、语音科学和数据科学的知识,未来的研究可能会具有更大的临床相关性。