从家庭录音中检测尖叫声以识别发脾气行为：使用迁移机器学习的探索性研究

Detecting Screams From Home Audio Recordings to Identify Tantrums: Exploratory Study Using Transfer Machine Learning.

作者信息

O'Donovan Rebecca, Sezgin Emre, Bambach Sven, Butter Eric, Lin Simon

机构信息

The Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH, United States.

Department of Psychology, Nationwide Children's Hospital, Columbus, OH, United States.

出版信息

JMIR Form Res. 2020 Jun 16;4(6):e18279. doi: 10.2196/18279.

DOI:10.2196/18279

PMID:32459656

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7327591/

Abstract

BACKGROUND

Qualitative self- or parent-reports used in assessing children's behavioral disorders are often inconvenient to collect and can be misleading due to missing information, rater biases, and limited validity. A data-driven approach to quantify behavioral disorders could alleviate these concerns. This study proposes a machine learning approach to identify screams in voice recordings that avoids the need to gather large amounts of clinical data for model training.

OBJECTIVE

The goal of this study is to evaluate if a machine learning model trained only on publicly available audio data sets could be used to detect screaming sounds in audio streams captured in an at-home setting.

METHODS

Two sets of audio samples were prepared to evaluate the model: a subset of the publicly available AudioSet data set and a set of audio data extracted from the TV show Supernanny, which was chosen for its similarity to clinical data. Scream events were manually annotated for the Supernanny data, and existing annotations were refined for the AudioSet data. Audio feature extraction was performed with a convolutional neural network pretrained on AudioSet. A gradient-boosted tree model was trained and cross-validated for scream classification on the AudioSet data and then validated independently on the Supernanny audio.

RESULTS

On the held-out AudioSet clips, the model achieved a receiver operating characteristic (ROC)-area under the curve (AUC) of 0.86. The same model applied to three full episodes of Supernanny audio achieved an ROC-AUC of 0.95 and an average precision (positive predictive value) of 42% despite screams only making up 1.3% (n=92/7166 seconds) of the total run time.

CONCLUSIONS

These results suggest that a scream-detection model trained with publicly available data could be valuable for monitoring clinical recordings and identifying tantrums as opposed to depending on collecting costly privacy-protected clinical data for model training.

摘要

背景

用于评估儿童行为障碍的定性自我报告或家长报告往往难以收集，并且由于信息缺失、评分者偏差和有效性有限，可能会产生误导。一种数据驱动的方法来量化行为障碍可以缓解这些问题。本研究提出了一种机器学习方法来识别语音记录中的尖叫声，该方法无需收集大量临床数据进行模型训练。

目的

本研究的目的是评估仅在公开可用音频数据集上训练的机器学习模型是否可用于检测在家中捕获的音频流中的尖叫声。

方法

准备了两组音频样本以评估该模型：公开可用的AudioSet数据集的一个子集，以及从电视节目《超级保姆》中提取的一组音频数据，选择该节目是因为其与临床数据相似。对《超级保姆》数据中的尖叫事件进行了人工标注，并对AudioSet数据的现有标注进行了完善。使用在AudioSet上预训练的卷积神经网络进行音频特征提取。训练了一个梯度提升树模型，并对AudioSet数据上的尖叫分类进行交叉验证，然后在《超级保姆》音频上独立验证。