School of Finance, Anhui University of Finance and Economics, Bengbu 233030, China.
School of Business, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China.
Comput Intell Neurosci. 2021 Dec 10;2021:1165296. doi: 10.1155/2021/1165296. eCollection 2021.
To detect comprehensive clues and provide more accurate forecasting in the early stage of financial distress, in addition to financial indicators, digitalization of lengthy but indispensable textual disclosure, such as Management Discussion and Analysis (MD&A), has been emphasized by researchers. However, most studies divide the long text into words and count words to treat the text as word count vectors, bringing massive invalid information but ignoring meaningful contexts. Aiming to efficiently represent the text of large size, an end-to-end neural networks model based on hierarchical self-attention is proposed in this study after the state-of-the-art pretrained model is introduced for text embedding including contexts. The proposed model has two notable characteristics. First, the hierarchical self-attention only affords the essential content with high weights in word-level and sentence-level and automatically neglects lots of information that has no business with risk prediction, which is suitable for extracting effective parts of the large-scale text. Second, after fine-tuning, the word embedding adapts the specific contexts of samples and conveys the original text expression more accurately without excessive manual operations. Experiments confirm that the addition of text improves the accuracy of financial distress forecasting and the proposed model outperforms benchmark models better at AUC and 2-score. For visualization, the elements in the weight matrix of hierarchical self-attention act as scalers to estimate the importance of each word and sentence. In this way, the "red-flag" statement that implies financial risk is figured out and highlighted in the original text, providing effective references for decision-makers.
为了在财务困境的早期阶段发现全面的线索并提供更准确的预测,除了财务指标外,研究者强调了冗长但不可或缺的文本披露(如管理层讨论与分析(MD&A))的数字化。然而,大多数研究将长文本分割成单词并计算单词数量,将文本视为单词计数向量,从而带来大量无效信息,而忽略了有意义的上下文。本研究旨在有效地表示大型文本,在引入包括上下文的文本嵌入的最新预训练模型之后,提出了一种基于层次自注意力的端到端神经网络模型。所提出的模型具有两个显著特点。首先,层次自注意力仅为单词级和句子级提供具有高权重的必要内容,并自动忽略与风险预测无关的大量信息,这适合提取大规模文本的有效部分。其次,经过微调后,单词嵌入适应了样本的特定上下文,更准确地传达了原始文本的表达,而无需过多的人工操作。实验证实,文本的添加提高了财务困境预测的准确性,并且所提出的模型在 AUC 和 2 分数方面优于基准模型。为了可视化,层次自注意力的权重矩阵中的元素充当标量,以估计每个单词和句子的重要性。这样,就可以找出并突出显示原始文本中隐含财务风险的“红旗”声明,为决策者提供有效的参考。