IEEE Trans Cybern. 2022 Dec;52(12):13848-13861. doi: 10.1109/TCYB.2021.3109066. Epub 2022 Nov 18.
In many financial applications, such as fraud detection, reject inference, and credit evaluation, detecting clusters automatically is critical because it helps to understand the subpatterns of the data that can be used to infer user's behaviors and identify potential risks. Due to the complexity of human behaviors and changing social environments, the distributions of financial data are usually complex and it is challenging to find clusters and give reasonable interpretations. The goal of this study is to develop an integrated approach to detect clusters in financial data, and optimize the scope of the clusters such that the clusters can be easily interpreted. Specifically, we first proposed a new cluster quality evaluation criterion, which is free from large-scale computation and can guide base clustering algorithms such as k -Means to detect hyperellipsoidal clusters adaptively. Then, we designed a new solver for a revised support vector data description model, which efficiently refines the centroids and scopes of the detected clusters to make the clusters tighter such that the data in the clusters share greater similarities, and thus, the clusters can be easily interpreted with eigenvectors. Using ten financial datasets, the experiments showed that the proposed algorithm can efficiently find reasonable number of clusters. The proposed approach is suitable for large-scale financial datasets whose features are meaningful, and also applicable to financial mining tasks, such as data distribution interpretation and anomaly detection.
在许多金融应用中,如欺诈检测、拒绝推断和信用评估,自动检测聚类至关重要,因为它有助于理解数据的子模式,可用于推断用户的行为和识别潜在风险。由于人类行为的复杂性和不断变化的社会环境,金融数据的分布通常很复杂,很难找到聚类并给出合理的解释。本研究的目的是开发一种综合方法来检测金融数据中的聚类,并优化聚类的范围,以便聚类可以很容易地进行解释。具体来说,我们首先提出了一种新的聚类质量评估标准,该标准无需大规模计算,可以指导 k-Means 等基础聚类算法自适应地检测超椭球聚类。然后,我们设计了一个新的求解器来修正支持向量数据描述模型,该模型可以有效地细化检测到的聚类的质心和范围,使聚类更加紧密,从而使聚类中的数据具有更大的相似性,因此可以使用特征向量更容易地解释聚类。使用十个金融数据集的实验表明,所提出的算法可以有效地找到合理数量的聚类。所提出的方法适用于特征有意义的大规模金融数据集,也适用于金融挖掘任务,如数据分布解释和异常检测。