Ramesh Vishwajith, Vatanparvar Korosh, Nemati Ebrahim, Nathan Viswam, Rahman Md Mahbubur, Kuang Jilong
Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:5682-5688. doi: 10.1109/EMBC44109.2020.9175597.
Despite the prevalence of respiratory diseases, their diagnosis by clinicians is challenging. Accurately assessing airway sounds requires extensive clinical training and equipment that may not be easily available. Current methods that automate this diagnosis are hindered by their use of features that require pulmonary function tests. We leverage the audio characteristics of coughs to create classifiers that can distinguish common respiratory diseases in adults. Moreover, we build on recent advances in generative adversarial networks to augment our dataset with cleverly engineered synthetic cough samples for each class of major respiratory disease, to balance and increase our dataset size. We experimented on cough samples collected with a smartphone from 45 subjects in a clinic. Our CoughGAN-improved Support Vector Machine and Random Forest models show up to 76% test accuracy and 83% F1 score in classifying subjects' conditions between healthy and three major respiratory diseases. Adding our synthetic coughs improves the performance we can obtain from a relatively small unbalanced healthcare dataset by boosting the accuracy over 30%. Our data augmentation reduces overfitting and discourages the prediction of a single, dominant class. These results highlight the feasibility of automatic, cough-based respiratory disease diagnosis using smartphones or wearables in the wild.
尽管呼吸系统疾病普遍存在,但临床医生对其进行诊断仍具有挑战性。准确评估呼吸音需要广泛的临床培训以及可能不易获得的设备。当前实现这种诊断自动化的方法受到其使用需要肺功能测试的特征的阻碍。我们利用咳嗽的音频特征来创建能够区分成人常见呼吸系统疾病的分类器。此外,我们基于生成对抗网络的最新进展,为每类主要呼吸系统疾病精心设计合成咳嗽样本,以扩充我们的数据集,从而平衡并增加数据集的规模。我们对在诊所中用智能手机从45名受试者收集的咳嗽样本进行了实验。我们的CoughGAN改进的支持向量机和随机森林模型在将受试者的健康状况与三种主要呼吸系统疾病进行分类时,测试准确率高达76%,F1分数达83%。添加我们的合成咳嗽样本可将准确率提高30%以上,从而提升了我们从相对较小且不均衡的医疗保健数据集中所能获得的性能。我们的数据增强减少了过拟合,并抑制了对单一主导类别的预测。这些结果凸显了在实际场景中使用智能手机或可穿戴设备基于咳嗽进行自动呼吸系统疾病诊断的可行性。