Department of Electrical and Computer Engineering, Aarhus University, Aarhus, 8200, Denmark.
Spar Nord Bank, 9100, Aalborg, Denmark.
Sci Data. 2023 Sep 28;10(1):661. doi: 10.1038/s41597-023-02569-2.
Bank transactions are highly confidential. As a result, there are no real public data sets that can be used to investigate and compare anti-money laundering (AML) methods in banks. This severely limits research on important AML problems such as efficiency, effectiveness, class imbalance, concept drift, and interpretability. To address the issue, we present SynthAML: a synthetic data set to benchmark statistical and machine learning methods for AML. The data set builds on real data from Spar Nord, a systemically important Danish bank, and contains 20,000 AML alerts and over 16 million transactions. Experimental results indicate that performance on SynthAML can be transferred to the real world. As use cases, we present and discuss open problems in the AML literature.
银行交易高度保密。因此,没有真正的公共数据集可用于调查和比较银行的反洗钱 (AML) 方法。这严重限制了对 AML 领域的一些重要问题的研究,例如效率、有效性、类不平衡、概念漂移和可解释性。为了解决这个问题,我们提出了 SynthAML:一个用于基准统计和机器学习方法的 AML 合成数据集。该数据集基于来自 Spar Nord(一家具有系统重要性的丹麦银行)的真实数据,包含 20,000 个 AML 警报和超过 1600 万笔交易。实验结果表明,SynthAML 上的性能可以转移到现实世界中。作为用例,我们提出并讨论了 AML 文献中的开放性问题。