Alsuhaibani Mohammed
Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia.
PeerJ Comput Sci. 2025 Jun 16;11:e2957. doi: 10.7717/peerj-cs.2957. eCollection 2025.
Natural language inference (NLI) is a fundamental task in natural language processing that focuses on determining the relationship between pairs of sentences. In this article, we present a simple and straightforward approach to evaluate the effectiveness of various transformer-based models such as bidirectional encoder representations from transformers (BERT), Generative Pre-trained Transformer (GPT), robustly optimized BERT approach (RoBERTa), and XLNet in generating sentence embeddings for NLI. We conduct comprehensive experiments with different pooling techniques and evaluate the embeddings using different norms across multiple layers of each model. Our results demonstrate that the choice of pooling strategy, norm, and model layer significantly impacts the performance of NLI, with the best results achieved using max pooling and the L2 norm across specific model layers. On the Stanford Natural Language Inference (SNLI) dataset, the model reached 90% accuracy and 86% F1-score, while on the MedNLI dataset, the highest F1-score recorded was 84%. This article provides insights into how different models and evaluation strategies can be effectively combined to improve the understanding and classification of sentence relationships in NLI tasks.
自然语言推理(NLI)是自然语言处理中的一项基础任务,专注于确定句子对之间的关系。在本文中,我们提出一种简单直接的方法,来评估各种基于Transformer的模型的有效性,例如来自Transformer的双向编码器表示(BERT)、生成式预训练Transformer(GPT)、稳健优化的BERT方法(RoBERTa)以及XLNet在为自然语言推理生成句子嵌入方面的有效性。我们使用不同的池化技术进行了全面的实验,并在每个模型的多个层中使用不同的范数来评估嵌入。我们的结果表明,池化策略、范数和模型层的选择对自然语言推理的性能有显著影响,在特定模型层使用最大池化和L2范数可取得最佳结果。在斯坦福自然语言推理(SNLI)数据集上,该模型达到了90%的准确率和86%的F1分数,而在医学自然语言推理(MedNLI)数据集上,记录的最高F1分数为84%。本文深入探讨了如何有效地结合不同的模型和评估策略,以提高自然语言推理任务中句子关系的理解和分类。