Physikalisch-Technische Bundesanstalt, Berlin, Germany.
Fraunhofer Heinrich Hertz Institute, Berlin, Germany.
Sci Data. 2020 May 25;7(1):154. doi: 10.1038/s41597-020-0495-6.
Electrocardiography (ECG) is a key non-invasive diagnostic tool for cardiovascular diseases which is increasingly supported by algorithms based on machine learning. Major obstacles for the development of automatic ECG interpretation algorithms are both the lack of public datasets and well-defined benchmarking procedures to allow comparison s of different algorithms. To address these issues, we put forward PTB-XL, the to-date largest freely accessible clinical 12-lead ECG-waveform dataset comprising 21837 records from 18885 patients of 10 seconds length. The ECG-waveform data was annotated by up to two cardiologists as a multi-label dataset, where diagnostic labels were further aggregated into super and subclasses. The dataset covers a broad range of diagnostic classes including, in particular, a large fraction of healthy records. The combination with additional metadata on demographics, additional diagnostic statements, diagnosis likelihoods, manually annotated signal properties as well as suggested folds for splitting training and test sets turns the dataset into a rich resource for the development and the evaluation of automatic ECG interpretation algorithms.
心电图(ECG)是心血管疾病的一种重要的非侵入性诊断工具,越来越多的基于机器学习的算法为其提供支持。自动心电图解释算法的发展主要面临两个障碍:缺乏公共数据集和明确的基准测试程序,无法对不同算法进行比较。为了解决这些问题,我们提出了 PTB-XL,这是迄今为止最大的免费临床 12 导联心电图波数据集,包含来自 18885 名患者的 21837 个 10 秒长记录。心电图波数据由多达两名心脏病专家标记为多标签数据集,其中诊断标签进一步聚合为超级类和子类。该数据集涵盖了广泛的诊断类别,包括特别是大量健康记录。与人口统计学、其他诊断声明、诊断可能性、手动标记的信号特性以及建议的用于划分训练集和测试集的折叠相结合,使该数据集成为开发和评估自动心电图解释算法的丰富资源。