Univ Rennes, INSERM, LTSI-UMR 1099, F-35000 Rennes, France.
Voxygen, F-22560 Pleumeur-Bodou, France.
Sensors (Basel). 2022 Feb 25;22(5):1823. doi: 10.3390/s22051823.
Cry analysis is an important tool to evaluate the development of preterm infants. However, the context of Neonatal Intensive Care Units is challenging, since a wide variety of sounds can occur (e.g., alarms and adult voices). In this paper, a method to extract cries is proposed. It is based on an initial segmentation between silence and sound events, followed by feature extraction on the resulting audio segments and a cry and non-cry classification. A database of 198 cry events coming from 21 newborns and 439 non-cry events was created. Then, a set of features-including Mel-Frequency Cepstral Coefficients-issued from principal component analysis, was computed to describe each audio segment. For the first time in cry analysis, noise was handled using harmonic plus noise analysis. Several machine learning models have been compared. The K-Nearest Neighbours approach showed the best results with a precision of 92.9%. To test the approach in a monitoring application, 412 h of recordings were automatically processed. The cries automatically selected were replayed and a precision of 92.2% was obtained. The impact of errors on the fundamental frequency characterisation was also studied. Results show that despite a difficult context, automatic cry extraction for non-invasive monitoring of vocal development of preterm infants is achievable.
哭声分析是评估早产儿发育的重要工具。然而,新生儿重症监护病房的环境极具挑战性,因为会出现各种各样的声音(例如,警报声和成人的声音)。在本文中,提出了一种提取哭声的方法。它基于在无声和有声事件之间进行初始分割,然后对得到的音频段进行特征提取,并进行哭声和非哭声分类。创建了一个包含 21 名新生儿的 198 个哭声事件和 439 个非哭声事件的数据库。然后,计算了一组特征,包括主成分分析得出的梅尔频率倒谱系数,以描述每个音频段。这是哭声分析中首次使用谐波加噪声分析来处理噪声。比较了几种机器学习模型。K 最近邻方法表现最佳,准确率为 92.9%。为了在监测应用中测试该方法,自动处理了 412 小时的录音。自动选择的哭声被回放,准确率为 92.2%。还研究了错误对基频特征的影响。结果表明,尽管环境困难,但是对于早产儿非侵入性监测声音发育来说,自动提取哭声是可行的。