Research Center for Computer Science and Information Technologies, Macedonian Academy of Sciences and Arts, Skopje, Macedonia.
Vinca Institute of Nuclear Sciences - National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia.
Comput Methods Programs Biomed. 2022 Jun;221:106901. doi: 10.1016/j.cmpb.2022.106901. Epub 2022 May 22.
To investigate the impact of atrial flutter (A) in the atrial arrhythmias classification task. We additionally advocate the use of a subject-based split for future studies in the field in order to avoid within-subject correlation which may lead to over-optimistic inferences. Finally, we demonstrate the effectiveness of the classifiers outside of the initially studied circumstances, by performing an inter-dataset model evaluation of the classifiers in data from different sources.
ECG signals of two private and three public (two MIT-BIH and Chapman ecgdb) databases were preprocessed and divided into 10s segments which were then subject to feature extraction. The created datasets were divided into a training and test set in two ways, based on a random split and a patient split. Classification was performed using the XGBoost classifier, as well as two benchmark classification models using both data splits. The trained models were then used to make predictions on the test data of the remaining datasets.
The XGBoost model yielded the best performance across all datasets compared to the remaining benchmark models, however variability in model performance was seen across datasets, with accuracy ranging from 70.6% to 89.4%, sensitivity ranging from 61.4% to 76.8%, and specificity ranging from 87.3% to 95.5%. When comparing the results between the patient and the random split, no significant difference was seen in the two private datasets and the Chapman dataset, where the number of samples per patient is low. Nonetheless, in the MIT-BIH dataset, where the average number of samples per patient is approximately 1300, a noticeable disparity was identified. The accuracy, sensitivity, and specificity of the random split in this dataset of 93.6%, 86.4%, and 95.9% respectively, were decreased to 88%, 61.4%, and 89.8% in the patient split, with the largest drop being in A sensitivity, from 71% to 5.4%. The inter-dataset scores were also significantly lower than their intra-dataset counterparts across all datasets.
CAD systems have great potential in the assistance of physicians in reliable, precise and efficient detection of arrhythmias. However, although compelling research has been done in the field, yielding models with excellent performances on their datasets, we show that these results may be over-optimistic. In our study, we give insight into the difficulty of detection of A on several datasets and show the need for a higher representation of A in public datasets. Furthermore, we show the necessity of a more structured evaluation of model performance through the use of a patient-based split and inter-dataset testing scheme to avoid the problem of within-subject correlation which may lead to misleadingly high scores. Finally, we stress the need for the creation and use of datasets with a higher number of patients and a more balanced representation of classes if we are to progress in this mission.
研究心房扑动(A)对心房心律失常分类任务的影响。我们还主张在未来的研究中使用基于个体的分割,以避免可能导致过度乐观推断的个体内相关性。最后,我们通过在来自不同来源的数据中对分类器进行跨数据集模型评估,证明了分类器在最初研究环境之外的有效性。
对两个私人数据库和三个公共数据库(两个 MIT-BIH 和 Chapman ecgdb)的 ECG 信号进行预处理,并将其分为 10 秒段,然后对其进行特征提取。创建的数据集通过随机分割和患者分割两种方式分为训练集和测试集。使用 XGBoost 分类器以及使用两种数据分割的两个基准分类模型进行分类。然后,使用训练好的模型对其余数据集的测试数据进行预测。
与其余基准模型相比,XGBoost 模型在所有数据集上均表现出最佳性能,但在跨数据集的模型性能方面存在差异,准确率范围为 70.6%至 89.4%,灵敏度范围为 61.4%至 76.8%,特异性范围为 87.3%至 95.5%。当比较患者和随机分割之间的结果时,在两个私人数据集和 Chapman 数据集(每个患者的样本数量较低)中没有发现显著差异。然而,在 MIT-BIH 数据集(每个患者的平均样本数约为 1300)中,发现了明显的差异。该数据集的随机分割准确率、灵敏度和特异性分别为 93.6%、86.4%和 95.9%,在患者分割中分别降至 88%、61.4%和 89.8%,最大降幅出现在 A 的灵敏度上,从 71%降至 5.4%。跨数据集的分数也明显低于所有数据集的内部数据集分数。
CAD 系统在辅助医生可靠、准确、高效地检测心律失常方面具有巨大潜力。然而,尽管该领域已经进行了引人注目的研究,产生了在其数据集上表现出色的模型,但我们表明这些结果可能过于乐观。在我们的研究中,我们深入了解了在多个数据集上检测 A 的难度,并表明需要在公共数据集中更充分地表示 A。此外,我们通过使用基于患者的分割和跨数据集测试方案来展示对模型性能进行更结构化评估的必要性,以避免可能导致误导性高分的个体内相关性问题。最后,我们强调如果我们要在此任务中取得进展,就需要创建和使用具有更多患者和更平衡类表示的数据集。