基于 12 导联心电图数据的自监督表示学习。

Self-supervised representation learning from 12-lead ECG data.

机构信息

Physikalisch Technische Bundesanstalt, Berlin, Germany; Fraunhofer Heinrich Hertz Institute, Berlin, Germany.

Fraunhofer Heinrich Hertz Institute, Berlin, Germany.

出版信息

Comput Biol Med. 2022 Feb;141:105114. doi: 10.1016/j.compbiomed.2021.105114. Epub 2021 Dec 18.

Abstract

Clinical 12-lead electrocardiography (ECG) is one of the most widely encountered kinds of biosignals. Despite the increased availability of public ECG datasets, label scarcity remains a central challenge in the field. Self-supervised learning represents a promising way to alleviate this issue. This would allow to train more powerful models given the same amount of labeled data and to incorporate or improve predictions about rare diseases, for which training datasets are inherently limited. In this work, we put forward the first comprehensive assessment of self-supervised representation learning from clinical 12-lead ECG data. To this end, we adapt state-of-the-art self-supervised methods based on instance discrimination and latent forecasting to the ECG domain. In a first step, we learn contrastive representations and evaluate their quality based on linear evaluation performance on a recently established, comprehensive, clinical ECG classification task. In a second step, we analyze the impact of self-supervised pretraining on finetuned ECG classifiers as compared to purely supervised performance. For the best-performing method, an adaptation of contrastive predictive coding, we find a linear evaluation performance only 0.5% below supervised performance. For the finetuned models, we find improvements in downstream performance of roughly 1% compared to supervised performance, label efficiency, as well as robustness against physiological noise. This work clearly establishes the feasibility of extracting discriminative representations from ECG data via self-supervised learning and the numerous advantages when finetuning such representations on downstream tasks as compared to purely supervised training. As first comprehensive assessment of its kind in the ECG domain carried out exclusively on publicly available datasets, we hope to establish a first step towards reproducible progress in the rapidly evolving field of representation learning for biosignals.

摘要

临床 12 导联心电图(ECG)是最常见的生物信号之一。尽管公共 ECG 数据集的可用性增加了,但标签稀缺仍然是该领域的一个核心挑战。自监督学习代表了一种有前途的方法,可以缓解这个问题。这将允许在相同数量的标记数据下训练更强大的模型,并对罕见疾病进行预测,对于这些疾病,训练数据集本身是有限的。在这项工作中,我们首次全面评估了从临床 12 导联 ECG 数据中进行自监督表示学习。为此,我们将基于实例判别和潜在预测的最先进的自监督方法应用于 ECG 领域。在第一步中,我们学习对比表示,并根据最近建立的全面临床 ECG 分类任务的线性评估性能来评估它们的质量。在第二步中,我们分析了与纯监督性能相比,自监督预训练对 finetuned ECG 分类器的影响。对于表现最好的方法,即对比预测编码的一种改编,我们发现线性评估性能仅比监督性能低 0.5%。对于 finetuned 模型,我们发现与监督性能相比,下游性能提高了约 1%,标签效率以及对生理噪声的鲁棒性也得到了提高。这项工作清楚地证明了通过自监督学习从 ECG 数据中提取鉴别表示的可行性,以及在下游任务上 finetune 这些表示与纯监督训练相比的众多优势。作为在 ECG 领域进行的首次全面评估,仅在公开可用的数据集上进行,我们希望为生物信号的表示学习领域的快速发展迈出可重复进展的第一步。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索