Li Jiamin, Liu Xingbo, Nie Xiushan, Ma Lele, Li Peng, Zhang Kai, Yin Yilong
School of Software, Shandong University, Jinan, China.
School of Computer Science and Technology, Shandong Jianzhu University, Jinan, China.
Comput Intell Neurosci. 2021 Apr 16;2021:6650962. doi: 10.1155/2021/6650962. eCollection 2021.
Similar judicial case matching aims to enable an accurate selection of a judicial document that is most similar to the target document from multiple candidates. The core of similar judicial case matching is to calculate the similarity between two fact case documents. Owing to similar judicial case matching techniques, legal professionals can promptly find and judge similar cases in a candidate set. These techniques can also benefit the development of judicial systems. However, the document of judicial cases not only is long in length but also has a certain degree of structural complexity. Meanwhile, a variety of judicial cases are also increasing rapidly; thus, it is difficult to find the document most similar to the target document in a large corpus. In this study, we present a novel similar judicial case matching model, which obtains the weight of judicial feature attributes based on hash learning and realizes fast similar matching by using a binary code. The proposed model extracts the judicial feature attributes vector using the bidirectional encoder representations from transformers (BERT) model and subsequently obtains the weighted judicial feature attributes through learning the hash function. We further impose triplet constraints to ensure that the similarity of judicial case data is well preserved when projected into the Hamming space. Comprehensive experimental results on public datasets show that the proposed method is superior in the task of similar judicial case matching and is suitable for large-scale similar judicial case matching.
相似司法案例匹配旨在从多个候选文档中准确选择与目标文档最相似的司法文档。相似司法案例匹配的核心是计算两个事实案例文档之间的相似度。借助相似司法案例匹配技术,法律专业人员能够在候选集中迅速找到并判断相似案例。这些技术也有助于司法系统的发展。然而,司法案例文档不仅篇幅长,而且结构复杂度较高。同时,各类司法案例数量也在迅速增加,因此在大型语料库中很难找到与目标文档最相似的文档。在本研究中,我们提出了一种新颖的相似司法案例匹配模型,该模型基于哈希学习获取司法特征属性的权重,并使用二进制代码实现快速相似匹配。所提出的模型使用双向编码器表征来自变换器(BERT)模型提取司法特征属性向量,随后通过学习哈希函数获得加权司法特征属性。我们进一步施加三元组约束,以确保司法案例数据在投影到汉明空间时其相似度得到良好保留。在公共数据集上的综合实验结果表明,所提出的方法在相似司法案例匹配任务中表现优异,适用于大规模相似司法案例匹配。