Laguarta Jordi, Hueto Ferran, Subirana Brian
MIT AutoID Laboratory Cambridge MA 02139 USA.
Harvard University Cambridge MA 02138 USA.
IEEE Open J Eng Med Biol. 2020 Sep 29;1:275-281. doi: 10.1109/OJEMB.2020.3026928. eCollection 2020.
We hypothesized that COVID-19 subjects, especially including asymptomatics, could be accurately discriminated only from a forced-cough cell phone recording using Artificial Intelligence. To train our MIT Open Voice model we built a data collection pipeline of COVID-19 cough recordings through our website (opensigma.mit.edu) between April and May 2020 and created the largest audio COVID-19 cough balanced dataset reported to date with 5,320 subjects. We developed an AI speech processing framework that leverages acoustic biomarker feature extractors to pre-screen for COVID-19 from cough recordings, and provide a personalized patient saliency map to longitudinally monitor patients in real-time, non-invasively, and at essentially zero variable cost. Cough recordings are transformed with Mel Frequency Cepstral Coefficient and inputted into a Convolutional Neural Network (CNN) based architecture made up of one Poisson biomarker layer and 3 pre-trained ResNet50's in parallel, outputting a binary pre-screening diagnostic. Our CNN-based models have been trained on 4256 subjects and tested on the remaining 1064 subjects of our dataset. Transfer learning was used to learn biomarker features on larger datasets, previously successfully tested in our Lab on Alzheimer's, which significantly improves the COVID-19 discrimination accuracy of our architecture. . AI techniques can produce a free, non-invasive, real-time, any-time, instantly distributable, large-scale COVID-19 asymptomatic screening tool to augment current approaches in containing the spread of COVID-19. Practical use cases could be for daily screening of students, workers, and public as schools, jobs, and transport reopen, or for pool testing to quickly alert of outbreaks in groups. General speech biomarkers may exist that cover several disease categories, as we demonstrated using the same ones for COVID-19 and Alzheimer's.
我们假设,仅通过人工智能对强制咳嗽的手机录音进行分析,就可以准确区分新冠肺炎患者,尤其是无症状感染者。为了训练我们的麻省理工学院开放语音模型,我们在2020年4月至5月期间通过我们的网站(opensigma.mit.edu)建立了一个新冠肺炎咳嗽录音的数据收集管道,并创建了迄今为止报告的最大的音频新冠肺炎咳嗽平衡数据集,其中包含5320名受试者。我们开发了一种人工智能语音处理框架,该框架利用声学生物标志物特征提取器从咳嗽录音中对新冠肺炎进行预筛查,并提供个性化的患者显著性图,以实时、非侵入性且基本上以零可变成本纵向监测患者。咳嗽录音通过梅尔频率倒谱系数进行转换,然后输入到一个基于卷积神经网络(CNN)的架构中,该架构由一个泊松生物标志物层和3个并行的预训练ResNet50组成,输出一个二元预筛查诊断结果。我们基于CNN的模型在4256名受试者上进行了训练,并在我们数据集中的其余1064名受试者上进行了测试。迁移学习用于在更大的数据集中学习生物标志物特征,此前我们实验室已在阿尔茨海默病研究中成功对其进行了测试,这显著提高了我们架构对新冠肺炎的区分准确率。人工智能技术可以产生一种免费、非侵入性、实时、随时可用、可即时分发的大规模新冠肺炎无症状筛查工具,以补充当前控制新冠肺炎传播的方法。实际应用案例可以是在学校、工作场所和交通重新开放时,对学生、工人和公众进行日常筛查,或者用于混合检测以快速提醒群体中出现的疫情。可能存在涵盖多种疾病类别的通用语音生物标志物,正如我们在新冠肺炎和阿尔茨海默病研究中使用相同的生物标志物所证明的那样。