Shinya Yuta, Ueno Taiji, Kawai Masahiko, Niwa Fusako, Tomotaki Seiichi, Myowa Masako
Graduate School of Education, The University of Tokyo, Tokyo, Japan.
School of Arts and Sciences, Tokyo Women's Christian University, Tokyo, Japan.
Sci Rep. 2025 Jul 2;15(1):23204. doi: 10.1038/s41598-025-03098-1.
Early infant crying provides critical insights into neurodevelopment, with atypical acoustic features linked to conditions such as preterm birth. However, previous studies have focused on limited and specific acoustic features, hindering a more comprehensive understanding of crying. To address this, we employed a convolutional neural network to assess whether whole Mel-spectrograms of infant crying capture gestational age (GA) variations (79 preterm infants; 52 term neonates). Our convolutional neural network models showed high accuracy in classifying gestational groups (92.4%) and in estimating the relative and continuous differences in GA (r = 0.73; p < 0.0001), outperforming previous studies. Grad-CAM and spectrogram manipulations further revealed that GA variations in infant crying were prominently reflected in temporal structures, particularly at the onset and offset regions of vocalizations. These findings suggest that decoding spectrotemporal features in infant crying through deep learning may offer valuable insights into atypical neurodevelopment in preterm infants, with potential to enhance early detection and intervention strategies in clinical practice.
早期婴儿啼哭为神经发育提供了关键见解,非典型声学特征与早产等情况相关。然而,以往的研究集中在有限的特定声学特征上,阻碍了对啼哭更全面的理解。为解决这一问题,我们采用卷积神经网络来评估婴儿啼哭的完整梅尔频谱图是否能捕捉胎龄(GA)差异(79名早产儿;52名足月儿)。我们的卷积神经网络模型在对孕周组进行分类(准确率92.4%)以及估计GA的相对和连续差异方面表现出高精度(r = 0.73;p < 0.0001),优于以往的研究。Grad-CAM和频谱图操作进一步表明,婴儿啼哭中的GA差异主要反映在时间结构上,特别是在发声的起始和结束区域。这些发现表明,通过深度学习解码婴儿啼哭中的频谱时间特征可能为早产儿非典型神经发育提供有价值的见解,有可能在临床实践中加强早期检测和干预策略。