Kostrzewa Łukasz, Nowak Robert
Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland.
Sensors (Basel). 2022 Mar 9;22(6):2137. doi: 10.3390/s22062137.
In this work, the problem of classifying Polish court rulings based on their text is presented. We use natural language processing methods and classifiers based on convolutional and recurrent neural networks. We prepared a dataset of 144,784 authentic, anonymized Polish court rulings. We analyze various general language embedding matrices and multiple neural network architectures with different parameters. Results show that such models can classify documents with very high accuracy (>99%). We also include an analysis of wrongly predicted examples. Performance analysis shows that our method is fast and could be used in practice on typical server hardware with 2 Processors (Central Processing Units, CPUs) or with a CPU and a Graphics processing unit (GPU).
在这项工作中,提出了基于波兰法院裁决文本进行分类的问题。我们使用自然语言处理方法以及基于卷积神经网络和循环神经网络的分类器。我们准备了一个包含144,784份真实、匿名的波兰法院裁决的数据集。我们分析了各种通用语言嵌入矩阵以及具有不同参数的多个神经网络架构。结果表明,此类模型能够以非常高的准确率(>99%)对文档进行分类。我们还对错误预测的示例进行了分析。性能分析表明,我们的方法速度很快,可在配备2个处理器(中央处理器,CPU)或一个CPU和一个图形处理器(GPU)的典型服务器硬件上实际使用。