Kim Mi-Young, Rabelo Juliano, Babiker Housam Khalifa Bashier, Rahman Md Abed, Goebel Randy
Department of Science, Augustana Faculty, University of Alberta, Camrose, Alberta Canada.
Alberta Machine Intelligence Institute, University of Alberta, Edmonton, Alberta Canada.
Rev Socionetwork Strateg. 2024;18(1):101-121. doi: 10.1007/s12626-023-00153-z. Epub 2024 Jan 11.
The challenge of information overload in the legal domain increases every day. The COLIEE competition has created four challenge tasks that are intended to encourage the development of systems and methods to alleviate some of that pressure: a case law retrieval (Task 1) and entailment (Task 2), and a statute law retrieval (Task 3) and entailment (Task 4). Here we describe our methods for Task 1 and Task 4. In Task 1, we used a sentence-transformer model to create a numeric representation for each case paragraph. We then created a histogram of the similarities between a query case and a candidate case. The histogram is used to build a binary classifier that decides whether a candidate case should be noticed or not. In Task 4, our approach relies on fine-tuning a pre-trained DeBERTa large language model (LLM) trained on SNLI and MultiNLI datasets. Our method for Task 4 was ranked third among eight participating teams in the COLIEE 2023 competition. For Task 4, We also compared the performance of the DeBERTa model with those of a knowledge distillation model and ensemble methods including Random Forest and Voting.
法律领域信息过载的挑战与日俱增。COLIEE竞赛设置了四项挑战性任务,旨在鼓励开发缓解部分此类压力的系统和方法:判例法检索(任务1)和蕴含关系(任务2),以及成文法检索(任务3)和蕴含关系(任务4)。在此我们描述我们针对任务1和任务4的方法。在任务1中,我们使用句子转换器模型为每个案例段落创建数值表示。然后我们创建了一个查询案例与候选案例之间相似度的直方图。该直方图用于构建一个二元分类器,以决定是否应关注某个候选案例。在任务4中,我们的方法依赖于对在SNLI和MultiNLI数据集上训练的预训练DeBERTa大语言模型(LLM)进行微调。我们针对任务4的方法在2023年COLIEE竞赛的八个参赛团队中排名第三。对于任务4,我们还将DeBERTa模型的性能与知识蒸馏模型以及包括随机森林和投票在内的集成方法的性能进行了比较。