Division of Global Mental Health, Department of Psychiatry and Behavioral Sciences, George Washington School of Medicine and Health Sciences, Washington, DC, United States.
Human and Social Development, Human Sciences Research Council, Pietermaritzburg, South Africa.
Front Public Health. 2021 Mar 29;9:633606. doi: 10.3389/fpubh.2021.633606. eCollection 2021.
The social environment, comprised of social support, social burden, and quality of interactions, influences a range of health outcomes, including mental health. Passive audio data collection on mobile phones (e.g., episodic recording of the auditory environment without requiring any active input from the phone user) enables new opportunities to understand the social environment. We evaluated the use of passive audio collection on mobile phones as a window into the social environment while conducting a study of mental health among adolescent and young mothers in Nepal. We enrolled 23 adolescent and young mothers who first participated in qualitative interviews to describe their social support and identify sounds potentially associated with that support. Then, episodic recordings were collected for 2 weeks from the mothers using an app to record 30 s of audio every 15 min from 4 A.M. to 9 P.M. Audio data were processed and classified using a pretrained model. Each classification category was accompanied by an estimated accuracy score. Manual validation of the machine-predicted speech and non-speech categories was done for accuracy. In qualitative interviews, mothers described a range of positive and negative social interactions and the sounds that accompanied these. Potential positive sounds included adult speech and laughter, infant babbling and laughter, and sounds from baby toys. Sounds characterizing negative stimuli included yelling, crying, screaming by adults and crying by infants. Sounds associated with social isolation included silence and TV or radio noises. Speech comprised 43% of all passively recorded audio clips ( = 7,725). Manual validation showed a 23% false positive rate and 62% false-negative rate for speech, demonstrating potential underestimation of speech exposure. Other common sounds were music and vehicular noises. Passively capturing audio has the potential to improve understanding of the social environment. However, a pre-trained model had the limited accuracy for identifying speech and lacked categories allowing distinction between positive and negative social interactions. To improve the contribution of passive audio collection to understanding the social environment, future work should improve the accuracy of audio categorization, code for constellations of sounds, and combine audio with other smartphone data collection such as location and activity.
社会环境由社会支持、社会负担和互动质量组成,会影响一系列健康结果,包括心理健康。通过手机进行被动音频数据采集(例如,无需手机用户主动输入即可记录听觉环境的片段)为了解社会环境提供了新的机会。我们评估了在尼泊尔进行青少年和年轻母亲心理健康研究时,使用手机进行被动音频采集来了解社会环境的情况。
我们招募了 23 名青少年和年轻母亲,她们首先参与了定性访谈,描述了她们的社会支持,并确定了可能与这种支持相关的声音。然后,使用应用程序从母亲那里收集了两周的间歇性录音,该应用程序每 15 分钟记录 30 秒的音频,时间从凌晨 4 点到晚上 9 点。音频数据经过处理并使用预先训练的模型进行分类。每个分类类别都附有估计的准确性得分。为了准确性,对机器预测的语音和非语音类别进行了手动验证。
在定性访谈中,母亲们描述了一系列积极和消极的社会互动以及伴随这些互动的声音。潜在的积极声音包括成人的言语和笑声、婴儿的咿呀学语和笑声,以及婴儿玩具发出的声音。代表负面刺激的声音包括成年人的叫喊、哭泣、尖叫和婴儿的哭泣。与社会隔离相关的声音包括沉默和电视或收音机噪音。语音占所有被动录制音频片段的 43%(=7725)。手动验证显示语音的假阳性率为 23%,假阴性率为 62%,表明语音暴露可能被低估。其他常见的声音是音乐和车辆噪音。
被动捕捉音频有可能改善对社会环境的理解。然而,预训练的模型识别语音的准确性有限,并且缺乏区分积极和消极社会互动的类别。为了提高被动音频采集对理解社会环境的贡献,未来的工作应该提高音频分类的准确性,为声音星座编码,并将音频与其他智能手机数据采集(如位置和活动)相结合。