College of Artificial Intelligence, Yango University, Fuzhou, Fujian, 350001, China.
Institute of Finance, National Yang Ming Chiao Tung University, Hsinchu, 30010, Taiwan.
Sci Rep. 2022 Oct 27;12(1):18042. doi: 10.1038/s41598-022-22130-2.
Modern money transfer services are convenient, attracting fraudulent actors to run scams in which victims are deceived into transferring funds to fraudulent accounts. Machine learning models are broadly applied due to the poor fraud detection performance of traditional rule-based approaches. Learning directly from raw transaction data is impractical due to its high-dimensional nature; most studies construct features instead by extracting patterns from raw transaction data. Past literature categorizes these features into recency, frequency, monetary, and anomaly detection features. We use various machine learning algorithms to examine the performance of features in these four categories with real transaction data; we compare them with the performance of our feature generation guideline based on the statistical perspectives and characteristics of (non)-fraudulent accounts. The results show that except for the monetary category, other feature categories used in the literature perform poorly regardless of which machine learning algorithm is used; anomaly detection features perform the worst. We find that even statistical features generated based on financial knowledge yield limited performance on a real transaction dataset. Our atypical detection characteristic of normal accounts improves the ability to distinguish them from fraudulent accounts and hence improves the overall detection results, outperforming other existent methods.
现代货币转账服务非常便捷,这吸引了诈骗分子利用其实施骗局,诱骗受害者将资金转入诈骗账户。由于传统基于规则的方法在欺诈检测方面的性能不佳,机器学习模型得到了广泛应用。由于其高维性质,直接从原始交易数据中学习是不切实际的;大多数研究通过从原始交易数据中提取模式来构建特征。过去的文献将这些特征分为最近、频率、货币和异常检测特征。我们使用各种机器学习算法,根据统计观点和(非)欺诈账户的特征,用真实交易数据来检验这四类特征的性能;并将它们与我们基于特征生成准则的性能进行比较。结果表明,除货币类别外,其他文献中使用的特征类别无论使用哪种机器学习算法,性能都很差;异常检测特征的性能最差。我们发现,即使是基于金融知识生成的统计特征,在真实交易数据集上的性能也很有限。我们对正常账户的非典型检测特征提高了其与欺诈账户的区分能力,从而提高了整体检测结果,优于其他现有方法。