Kuizinienė Dovilė, Savickas Paulius, Kunickaitė Rimantė, Juozaitienė Rūta, Damaševičius Robertas, Maskeliūnas Rytis, Krilavičius Tomas
Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania.
Silesian University of Technology, Gliwice, Poland.
PeerJ Comput Sci. 2024 Apr 30;10:e1956. doi: 10.7717/peerj-cs.1956. eCollection 2024.
Financial distress identification remains an essential topic in the scientific literature due to its importance for society and the economy. The advancements in information technology and the escalating volume of stored data have led to the emergence of financial distress that transcends the realm of financial statements and its' indicators (ratios). The feature space could be expanded by incorporating new perspectives on feature data categories such as macroeconomics, sectors, social, board, management, judicial incident, . However, the increased dimensionality results in sparse data and overfitted models. This study proposes a new approach for efficient financial distress classification assessment by combining dimensionality reduction and machine learning techniques. The proposed framework aims to identify a subset of features leading to the minimization of the loss function describing the financial distress in an enterprise. During the study, 15 dimensionality reduction techniques with different numbers of features and 17 machine-learning models were compared. Overall, 1,432 experiments were performed using Lithuanian enterprise data covering the period from 2015 to 2022. Results revealed that the artificial neural network (ANN) model with 30 ranked features identified using the Random Forest mean decreasing Gini (RF_MDG) feature selection technique provided the highest AUC score. Moreover, this study has introduced a novel approach for feature extraction, which could improve financial distress classification models.
由于财务困境识别对社会和经济具有重要意义,因此它仍然是科学文献中的一个重要主题。信息技术的进步和存储数据量的不断增加,导致了超越财务报表及其指标(比率)范围的财务困境的出现。通过纳入对宏观经济、行业、社会、董事会、管理层、司法事件等特征数据类别的新视角,可以扩展特征空间。然而,维度的增加导致数据稀疏和模型过度拟合。本研究提出了一种结合降维和机器学习技术的高效财务困境分类评估新方法。所提出的框架旨在识别导致企业财务困境损失函数最小化的特征子集。在研究过程中,比较了15种具有不同特征数量的降维技术和17种机器学习模型。总体而言,使用2015年至2022年期间的立陶宛企业数据进行了1432次实验。结果表明,使用随机森林平均基尼系数下降(RF_MDG)特征选择技术识别出的具有30个排名特征的人工神经网络(ANN)模型提供了最高的AUC分数。此外,本研究还引入了一种新颖的特征提取方法,该方法可以改进财务困境分类模型。