Macierzanka Krzysztof, Sau Arunashis, Patlatzoglou Konstantinos, Pastika Libor, Sieliwonczyk Ewa, Gurnani Mehak, Peters Nicholas S, Waks Jonathan W, Kramer Daniel B, Ng Fu Siong
National Heart and Lung Institute, Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK.
Department of Cardiology, Hammersmith Hospital, Imperial College Healthcare NHS Trust, Du Cane Road, London W12 0NN, UK.
Eur Heart J Digit Health. 2025 Feb 25;6(3):417-426. doi: 10.1093/ehjdh/ztaf011. eCollection 2025 May.
Many research databases with anonymized patient data contain electrocardiograms (ECGs) from which traditional identifiers have been removed. We evaluated the ability of artificial intelligence (AI) methods to determine the similarity between ECGs and assessed whether they have the potential to be misused to re-identify individuals from anonymized datasets.
We utilized a convolutional Siamese neural network (SNN) architecture, which derives a Euclidean distance similarity metric between two input ECGs. A secondary care dataset of 864 283 ECGs (72 455 subjects) was used. Siamese neural network-electrocardiogram (SNN-ECG) achieves an accuracy of 91.68% when classifying between 2 689 124 same-subject pairs and 2 689 124 different-subject pairs. This performance increases to 93.61% and 95.97% in outpatient and normal ECG subsets. In a simulated 'motivated intruder' test, SNN-ECG can identify individuals from large datasets. In datasets of 100, 1000, 10 000, and 20 000 ECGs, where only one ECG is also from the reference individual, it achieves success rates of 79.2%, 62.6%, 45.0%, and 40.0%, respectively. If this was random, the success would be 1%, 0.1%, 0.01%, and 0.005%, respectively. Additional basic information, like subject sex or age-range, enhances performance further. We also found that, on the subject level, ECG pair similarity is clinically relevant; greater ECG dissimilarity associates with all-cause mortality [hazard ratio, 1.22 (1.21-1.23), < 0.0001] and is additive to an AI-ECG model trained for mortality prediction.
Anonymized ECGs retain information that may facilitate subject re-identification, raising privacy and data protection concerns. However, SNN-ECG models also have positive uses and can enhance risk prediction of cardiovascular disease.
许多包含匿名患者数据的研究数据库中都有已去除传统标识符的心电图(ECG)。我们评估了人工智能(AI)方法确定心电图之间相似性的能力,并评估了它们是否有可能被滥用,以便从匿名数据集中重新识别个体。
我们使用了一种卷积连体神经网络(SNN)架构,该架构可得出两个输入心电图之间的欧几里得距离相似性度量。使用了一个包含864283份心电图(72455名受试者)的二级护理数据集。连体神经网络心电图(SNN-ECG)在对2689124对同受试者对和2689124对不同受试者对进行分类时,准确率达到91.68%。在门诊和正常心电图子集中,这一性能分别提高到93.61%和95.97%。在模拟的“有动机的入侵者”测试中,SNN-ECG可以从大型数据集中识别个体。在分别包含100、1000、10000和20000份心电图的数据集中,其中只有一份心电图也来自参考个体,其成功率分别为79.2%、62.6%、45.0%和40.0%。如果是随机的,成功率分别为1%、0.1%、0.01%和0.005%。额外的基本信息,如受试者性别或年龄范围,可进一步提高性能。我们还发现,在个体层面上,心电图对的相似性具有临床相关性;心电图差异越大与全因死亡率相关[风险比,1.22(1.21-1.23),<0.0001],并且是用于死亡率预测的AI-ECG模型的附加因素。
匿名心电图保留了可能有助于个体重新识别的信息,引发了隐私和数据保护方面的担忧。然而,SNN-ECG模型也有积极用途,并且可以增强心血管疾病的风险预测。