Weill Cornell Medicine, New York, NY, 10021, USA.
BMC Med Inform Decis Mak. 2022 Aug 29;22(1):226. doi: 10.1186/s12911-022-01942-2.
The application of machine learning to cardiac auscultation has the potential to improve the accuracy and efficiency of both routine and point-of-care screenings. The use of convolutional neural networks (CNN) on heart sound spectrograms in particular has defined state-of-the-art performance. However, the relative paucity of patient data remains a significant barrier to creating models that can adapt to a wide range of potential variability. To that end, we examined a CNN model's performance on automated heart sound classification, before and after various forms of data augmentation, and aimed to identify the most optimal augmentation methods for cardiac spectrogram analysis.
We built a standard CNN model to classify cardiac sound recordings as either normal or abnormal. The baseline control model achieved a PR AUC of 0.763 ± 0.047. Among the single data augmentation techniques explored, horizontal flipping of the spectrogram image improved the model performance the most, with a PR AUC of 0.819 ± 0.044. Principal component analysis color augmentation (PCA) and perturbations of saturation-value (SV) of the hue-saturation-value (HSV) color scale achieved a PR AUC of 0.779 ± 045 and 0.784 ± 0.037, respectively. Time and frequency masking resulted in a PR AUC of 0.772 ± 0.050. Pitch shifting, time stretching and compressing, noise injection, vertical flipping, and applying random color filters negatively impacted model performance. Concatenating the best performing data augmentation technique (horizontal flip) with PCA and SV perturbations improved model performance.
Data augmentation can improve classification accuracy by expanding and diversifying the dataset, which protects against overfitting to random variance. However, data augmentation is necessarily domain specific. For example, methods like noise injection have found success in other areas of automated sound classification, but in the context of cardiac sound analysis, noise injection can mimic the presence of murmurs and worsen model performance. Thus, care should be taken to ensure clinically appropriate forms of data augmentation to avoid negatively impacting model performance.
机器学习在心脏听诊中的应用有可能提高常规和即时筛查的准确性和效率。特别是使用卷积神经网络 (CNN) 对心音声谱图的应用,已经达到了最先进的性能水平。然而,患者数据的相对缺乏仍然是创建能够适应广泛潜在变异性的模型的一个重大障碍。为此,我们检查了一个 CNN 模型在自动心音分类中的性能,在进行各种形式的数据增强前后,并旨在确定最适合心脏声谱分析的数据增强方法。
我们构建了一个标准的 CNN 模型来对心脏声音记录进行正常或异常分类。基线控制模型的 PR AUC 为 0.763±0.047。在探索的单一数据增强技术中,声谱图图像的水平翻转最能提高模型性能,PR AUC 为 0.819±0.044。主成分分析颜色增强 (PCA) 和色调-饱和度-值 (HSV) 颜色通道饱和度-值 (SV) 的扰动分别达到了 0.779±0.045 和 0.784±0.037 的 PR AUC。时间和频率掩蔽的 PR AUC 为 0.772±0.050。音高移动、时间拉伸和压缩、噪声注入、垂直翻转和应用随机颜色滤波器对模型性能产生了负面影响。将性能最佳的数据增强技术(水平翻转)与 PCA 和 SV 扰动相结合,提高了模型性能。
数据增强可以通过扩展和多样化数据集来提高分类准确性,从而防止过度拟合随机方差。然而,数据增强是特定于领域的。例如,噪声注入等方法在自动化声音分类的其他领域取得了成功,但在心脏声音分析的背景下,噪声注入可以模拟杂音的存在,并降低模型性能。因此,应该小心谨慎地进行数据增强,以确保采用临床适当的数据增强形式,避免对模型性能产生负面影响。