Berisha Visar, Liss Julie M
School of Electrical Computer and Energy Engineering and College of Health Solutions, Arizona State University, Tempe, AZ, USA.
College of Health Solutions, Arizona State University, Tempe, AZ, USA.
NPJ Digit Med. 2024 Aug 9;7(1):208. doi: 10.1038/s41746-024-01199-1.
This perspective article explores the challenges and potential of using speech as a biomarker in clinical settings, particularly when constrained by the small clinical datasets typically available in such contexts. We contend that by integrating insights from speech science and clinical research, we can reduce sample complexity in clinical speech AI models with the potential to decrease timelines to translation. Most existing models are based on high-dimensional feature representations trained with limited sample sizes and often do not leverage insights from speech science and clinical research. This approach can lead to overfitting, where the models perform exceptionally well on training data but fail to generalize to new, unseen data. Additionally, without incorporating theoretical knowledge, these models may lack interpretability and robustness, making them challenging to troubleshoot or improve post-deployment. We propose a framework for organizing health conditions based on their impact on speech and promote the use of speech analytics in diverse clinical contexts beyond cross-sectional classification. For high-stakes clinical use cases, we advocate for a focus on explainable and individually-validated measures and stress the importance of rigorous validation frameworks and ethical considerations for responsible deployment. Bridging the gap between AI research and clinical speech research presents new opportunities for more efficient translation of speech-based AI tools and advancement of scientific discoveries in this interdisciplinary space, particularly if limited to small or retrospective datasets.
这篇观点文章探讨了在临床环境中使用语音作为生物标志物的挑战和潜力,特别是当受到此类环境中通常可用的小型临床数据集的限制时。我们认为,通过整合语音科学和临床研究的见解,我们可以降低临床语音人工智能模型中的样本复杂性,有可能缩短转化时间。大多数现有模型基于用有限样本量训练的高维特征表示,并且通常没有利用语音科学和临床研究的见解。这种方法可能导致过拟合,即模型在训练数据上表现出色,但无法推广到新的、未见过的数据。此外,如果不纳入理论知识,这些模型可能缺乏可解释性和稳健性,使其在部署后难以进行故障排除或改进。我们提出了一个基于健康状况对语音的影响来组织健康状况的框架,并促进在横断面分类之外的各种临床环境中使用语音分析。对于高风险临床用例,我们主张关注可解释且经过个体验证的措施,并强调严格验证框架和负责任部署的伦理考量的重要性。弥合人工智能研究与临床语音研究之间的差距,为更高效地转化基于语音的人工智能工具以及在这个跨学科领域推进科学发现带来了新机会,特别是在限于小型或回顾性数据集的情况下。