Suppr超能文献

基于层次自注意力的深度学习在纳入文本情况下的财务困境预测

Deep Learning Based on Hierarchical Self-Attention for Finance Distress Prediction Incorporating Text.

机构信息

School of Finance, Anhui University of Finance and Economics, Bengbu 233030, China.

School of Business, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China.

出版信息

Comput Intell Neurosci. 2021 Dec 10;2021:1165296. doi: 10.1155/2021/1165296. eCollection 2021.

Abstract

To detect comprehensive clues and provide more accurate forecasting in the early stage of financial distress, in addition to financial indicators, digitalization of lengthy but indispensable textual disclosure, such as Management Discussion and Analysis (MD&A), has been emphasized by researchers. However, most studies divide the long text into words and count words to treat the text as word count vectors, bringing massive invalid information but ignoring meaningful contexts. Aiming to efficiently represent the text of large size, an end-to-end neural networks model based on hierarchical self-attention is proposed in this study after the state-of-the-art pretrained model is introduced for text embedding including contexts. The proposed model has two notable characteristics. First, the hierarchical self-attention only affords the essential content with high weights in word-level and sentence-level and automatically neglects lots of information that has no business with risk prediction, which is suitable for extracting effective parts of the large-scale text. Second, after fine-tuning, the word embedding adapts the specific contexts of samples and conveys the original text expression more accurately without excessive manual operations. Experiments confirm that the addition of text improves the accuracy of financial distress forecasting and the proposed model outperforms benchmark models better at AUC and 2-score. For visualization, the elements in the weight matrix of hierarchical self-attention act as scalers to estimate the importance of each word and sentence. In this way, the "red-flag" statement that implies financial risk is figured out and highlighted in the original text, providing effective references for decision-makers.

摘要

为了在财务困境的早期阶段发现全面的线索并提供更准确的预测,除了财务指标外,研究者强调了冗长但不可或缺的文本披露(如管理层讨论与分析(MD&A))的数字化。然而,大多数研究将长文本分割成单词并计算单词数量,将文本视为单词计数向量,从而带来大量无效信息,而忽略了有意义的上下文。本研究旨在有效地表示大型文本,在引入包括上下文的文本嵌入的最新预训练模型之后,提出了一种基于层次自注意力的端到端神经网络模型。所提出的模型具有两个显著特点。首先,层次自注意力仅为单词级和句子级提供具有高权重的必要内容,并自动忽略与风险预测无关的大量信息,这适合提取大规模文本的有效部分。其次,经过微调后,单词嵌入适应了样本的特定上下文,更准确地传达了原始文本的表达,而无需过多的人工操作。实验证实,文本的添加提高了财务困境预测的准确性,并且所提出的模型在 AUC 和 2 分数方面优于基准模型。为了可视化,层次自注意力的权重矩阵中的元素充当标量,以估计每个单词和句子的重要性。这样,就可以找出并突出显示原始文本中隐含财务风险的“红旗”声明,为决策者提供有效的参考。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/14c8/8683239/e670d9d5da31/CIN2021-1165296.001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验