Department of Computer Science, University of Rochester, Rochester, NY, United States.
Center for Health and Technology, University of Rochester Medical Center, Rochester, NY, United States.
J Med Internet Res. 2021 Oct 19;23(10):e26305. doi: 10.2196/26305.
Access to neurological care for Parkinson disease (PD) is a rare privilege for millions of people worldwide, especially in resource-limited countries. In 2013, there were just 1200 neurologists in India for a population of 1.3 billion people; in Africa, the average population per neurologist exceeds 3.3 million people. In contrast, 60,000 people receive a diagnosis of PD every year in the United States alone, and similar patterns of rising PD cases-fueled mostly by environmental pollution and an aging population-can be seen worldwide. The current projection of more than 12 million patients with PD worldwide by 2040 is only part of the picture given that more than 20% of patients with PD remain undiagnosed. Timely diagnosis and frequent assessment are key to ensure timely and appropriate medical intervention, thus improving the quality of life of patients with PD.
In this paper, we propose a web-based framework that can help anyone anywhere around the world record a short speech task and analyze the recorded data to screen for PD.
We collected data from 726 unique participants (PD: 262/726, 36.1% were women; non-PD: 464/726, 63.9% were women; average age 61 years) from all over the United States and beyond. A small portion of the data (approximately 54/726, 7.4%) was collected in a laboratory setting to compare the performance of the models trained with noisy home environment data against high-quality laboratory-environment data. The participants were instructed to utter a popular pangram containing all the letters in the English alphabet, "the quick brown fox jumps over the lazy dog." We extracted both standard acoustic features (mel-frequency cepstral coefficients and jitter and shimmer variants) and deep learning-based embedding features from the speech data. Using these features, we trained several machine learning algorithms. We also applied model interpretation techniques such as Shapley additive explanations to ascertain the importance of each feature in determining the model's output.
We achieved an area under the curve of 0.753 for determining the presence of self-reported PD by modeling the standard acoustic features through the XGBoost-a gradient-boosted decision tree model. Further analysis revealed that the widely used mel-frequency cepstral coefficient features and a subset of previously validated dysphonia features designed for detecting PD from a verbal phonation task (pronouncing "ahh") influence the model's decision the most.
Our model performed equally well on data collected in a controlled laboratory environment and in the wild across different gender and age groups. Using this tool, we can collect data from almost anyone anywhere with an audio-enabled device and help the participants screen for PD remotely, contributing to equity and access in neurological care.
对于全世界数以百万计的人来说,获得神经科护理是一种罕见的特权,尤其是在资源有限的国家。2013 年,印度每 13 亿人口仅有 1200 名神经科医生;在非洲,每 100 万人中平均有超过 330 名神经科医生。相比之下,仅在美国每年就有 6 万人被诊断出患有帕金森病,而且在全球范围内,由于环境污染和人口老龄化等因素,帕金森病患者的数量不断上升。到 2040 年,全球将有超过 1200 万帕金森病患者,这只是部分情况,因为超过 20%的帕金森病患者仍未被诊断出来。及时诊断和频繁评估是确保及时和适当医疗干预的关键,从而提高帕金森病患者的生活质量。
本文提出了一个基于网络的框架,该框架可以帮助全球任何地方的任何人记录简短的演讲任务并分析记录的数据,以筛选帕金森病。
我们从美国各地和其他地区的 726 位独特参与者(PD:262/726,36.1%为女性;非 PD:464/726,63.9%为女性;平均年龄 61 岁)收集了数据。一小部分数据(约 54/726,7.4%)是在实验室环境中收集的,以比较使用嘈杂的家庭环境数据训练的模型与高质量实验室环境数据训练的模型的性能。参与者被指示说出一个包含英语字母表中所有字母的流行 pangram,“The quick brown fox jumps over the lazy dog”。我们从语音数据中提取了标准声学特征(梅尔频率倒谱系数和抖动和闪烁变体)和基于深度学习的嵌入特征。使用这些特征,我们训练了几种机器学习算法。我们还应用了模型解释技术,例如 Shapley 加性解释,以确定每个特征在确定模型输出方面的重要性。
通过使用 XGBoost-梯度提升决策树模型对标准声学特征进行建模,我们实现了通过建模标准声学特征来确定自我报告的帕金森病存在的曲线下面积为 0.753。进一步的分析表明,广泛使用的梅尔频率倒谱系数特征和一组以前验证过的用于从言语发声任务(发音“ahh”)中检测帕金森病的发音障碍特征对模型的决策影响最大。
我们的模型在受控实验室环境和不同性别和年龄组的野外环境中收集的数据上表现同样出色。使用这个工具,我们可以从几乎任何地方的任何人那里收集数据,并帮助参与者远程筛查帕金森病,为神经科护理的公平和可及性做出贡献。