Corponi Filippo, Li Bryan M, Anmella Gerard, Valenzuela-Pascual Clàudia, Mas Ariadna, Pacchiarotti Isabella, Valentí Marc, Grande Iria, Benabarre Antoni, Garriga Marina, Vieta Eduard, Young Allan H, Lawrie Stephen M, Whalley Heather C, Hidalgo-Mazzei Diego, Vergari Antonio
School of Informatics, University of Edinburgh, Edinburgh, United Kingdom.
The Alan Turing Institute, London, United Kingdom.
JMIR Mhealth Uhealth. 2024 Jul 17;12:e55094. doi: 10.2196/55094.
Personal sensing, leveraging data passively and near-continuously collected with wearables from patients in their ecological environment, is a promising paradigm to monitor mood disorders (MDs), a major determinant of the worldwide disease burden. However, collecting and annotating wearable data is resource intensive. Studies of this kind can thus typically afford to recruit only a few dozen patients. This constitutes one of the major obstacles to applying modern supervised machine learning techniques to MD detection.
In this paper, we overcame this data bottleneck and advanced the detection of acute MD episodes from wearables' data on the back of recent advances in self-supervised learning (SSL). This approach leverages unlabeled data to learn representations during pretraining, subsequently exploited for a supervised task.
We collected open access data sets recording with the Empatica E4 wristband spanning different, unrelated to MD monitoring, personal sensing tasks-from emotion recognition in Super Mario players to stress detection in undergraduates-and devised a preprocessing pipeline performing on-/off-body detection, sleep/wake detection, segmentation, and (optionally) feature extraction. With 161 E4-recorded subjects, we introduced E4SelfLearning, the largest-to-date open access collection, and its preprocessing pipeline. We developed a novel E4-tailored transformer (E4mer) architecture, serving as the blueprint for both SSL and fully supervised learning; we assessed whether and under which conditions self-supervised pretraining led to an improvement over fully supervised baselines (ie, the fully supervised E4mer and pre-deep learning algorithms) in detecting acute MD episodes from recording segments taken in 64 (n=32, 50%, acute, n=32, 50%, stable) patients.
SSL significantly outperformed fully supervised pipelines using either our novel E4mer or extreme gradient boosting (XGBoost): n=3353 (81.23%) against n=3110 (75.35%; E4mer) and n=2973 (72.02%; XGBoost) correctly classified recording segments from a total of 4128 segments. SSL performance was strongly associated with the specific surrogate task used for pretraining, as well as with unlabeled data availability.
We showed that SSL, a paradigm where a model is pretrained on unlabeled data with no need for human annotations before deployment on the supervised target task of interest, helps overcome the annotation bottleneck; the choice of the pretraining surrogate task and the size of unlabeled data for pretraining are key determinants of SSL success. We introduced E4mer, which can be used for SSL, and shared the E4SelfLearning collection, along with its preprocessing pipeline, which can foster and expedite future research into SSL for personal sensing.
个人传感利用可穿戴设备在患者自然生态环境中被动且近乎连续收集的数据,是监测情绪障碍(MDs)的一种很有前景的模式,情绪障碍是全球疾病负担的一个主要决定因素。然而,收集和标注可穿戴设备数据需要大量资源。因此,这类研究通常只能招募几十名患者。这构成了将现代监督式机器学习技术应用于MD检测的主要障碍之一。
在本文中,我们克服了这一数据瓶颈,并借助自监督学习(SSL)的最新进展,从可穿戴设备数据中推进急性MD发作的检测。这种方法利用未标注数据在预训练期间学习表示,随后用于监督任务。
我们收集了使用Empatica E4腕带记录的开放获取数据集,这些数据集涵盖不同的、与MD监测无关的个人传感任务——从超级马里奥玩家的情绪识别到大学生的压力检测——并设计了一个预处理管道,用于进行穿戴/未穿戴检测、睡眠/清醒检测、分割以及(可选)特征提取。我们以161名使用E4记录的受试者为对象,引入了E4SelfLearning,这是迄今为止最大的开放获取数据集及其预处理管道。我们开发了一种新颖的针对E4定制的Transformer(E4mer)架构,作为SSL和完全监督学习的蓝图;我们评估了在检测64名(n = 32,50%,急性,n = 32,50%,稳定)患者记录片段中的急性MD发作时,自监督预训练是否以及在何种条件下比完全监督基线(即完全监督的E4mer和深度学习前算法)有改进。
在使用我们新颖的E4mer或极端梯度提升(XGBoost)时,SSL显著优于完全监督管道:在总共4128个记录片段中,正确分类的记录片段分别为n = 3353(81.23%),而E4mer为n = 3110(75.35%),XGBoost为n = 2973(72.02%)。SSL性能与用于预训练的特定替代任务以及未标注数据的可用性密切相关。
我们表明,SSL是一种在未标注数据上进行预训练,无需人工标注即可部署到感兴趣的监督目标任务的模式,有助于克服标注瓶颈;预训练替代任务的选择和预训练未标注数据的大小是SSL成功的关键决定因素。我们引入了可用于SSL的E4mer,并分享了E4SelfLearning数据集及其预处理管道,这可以促进和加快未来针对个人传感的SSL研究。