Gu Xiao, Liu Zhangdaihong, Han Jinpei, Qiu Jianing, Fang Wenfei, Lu Lei, Clifton Lei, Zhang Yuan-Ting, Clifton David A
Department of Engineering Science, University of Oxford, Oxford, UK.
Oxford Suzhou Centre for Advanced Research, University of Oxford, Suzhou, China.
Commun Eng. 2025 Jul 26;4(1):135. doi: 10.1038/s44172-025-00467-6.
Healthcare wearables are transforming health monitoring, generating vast and complex data in everyday free-living environments. While supervised deep learning has enabled tremendous advances in interpreting such data, it remains heavily dependent on large labeled datasets, which are often difficult and expensive to obtain in clinical practice. Self-supervised contrastive learning (SSCL) provides a promising alternative by learning from unlabeled data, but conventional SSCL frequently overlooks important physiological similarities by treating all non-identical instances as unrelated, which can result in suboptimal representations. In this study, we revisit the enduring value of domain knowledge "embedded" in traditional domain feature engineering pipelines and demonstrate how it can be used to guide SSCL. We introduce a framework that integrates clinically meaningful features-such as heart rate variability from electrocardiograms (ECGs)-into the contrastive learning process. These features guide the formation of more relevant positive pairs through nearest-neighbor matching and promote global structure through clustering-based prototype representations. Evaluated across diverse wearable technologies, our method achieves comparable performance with only 10% labeled data, compared to conventional SSCL approaches with full annotations for fine-tuning. This work highlights the indispensable and sustainable role of domain expertise in advancing machine learning for real-world healthcare, especially for healthcare wearables.
可穿戴医疗设备正在改变健康监测方式,在日常自由生活环境中产生海量且复杂的数据。虽然有监督的深度学习在解读此类数据方面取得了巨大进展,但它仍然严重依赖大型标注数据集,而在临床实践中获取这些数据集往往既困难又昂贵。自监督对比学习(SSCL)通过从未标注数据中学习提供了一种很有前景的替代方法,但传统的SSCL常常将所有不同的实例视为不相关,从而忽略了重要的生理相似性,这可能导致次优表示。在本研究中,我们重新审视了传统领域特征工程管道中“嵌入”的领域知识的持久价值,并展示了如何利用它来指导SSCL。我们引入了一个框架,将具有临床意义的特征(如心电图(ECG)中的心率变异性)整合到对比学习过程中。这些特征通过最近邻匹配引导形成更相关的正样本对,并通过基于聚类的原型表示促进全局结构。在多种可穿戴技术上进行评估,与需要完整注释进行微调的传统SSCL方法相比,我们的方法仅使用10%的标注数据就能实现相当的性能。这项工作凸显了领域专业知识在推动面向现实世界医疗保健的机器学习,特别是可穿戴医疗设备方面不可或缺且可持续的作用。