意义的几何学：评估基于不同Transformer模型的句子嵌入用于自然语言推理

The geometry of meaning: evaluating sentence embeddings from diverse transformer-based models for natural language inference.

作者信息

Alsuhaibani Mohammed

机构信息

Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia.

出版信息

PeerJ Comput Sci. 2025 Jun 16;11:e2957. doi: 10.7717/peerj-cs.2957. eCollection 2025.

DOI:10.7717/peerj-cs.2957

PMID:40567742

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12193426/

Abstract

Natural language inference (NLI) is a fundamental task in natural language processing that focuses on determining the relationship between pairs of sentences. In this article, we present a simple and straightforward approach to evaluate the effectiveness of various transformer-based models such as bidirectional encoder representations from transformers (BERT), Generative Pre-trained Transformer (GPT), robustly optimized BERT approach (RoBERTa), and XLNet in generating sentence embeddings for NLI. We conduct comprehensive experiments with different pooling techniques and evaluate the embeddings using different norms across multiple layers of each model. Our results demonstrate that the choice of pooling strategy, norm, and model layer significantly impacts the performance of NLI, with the best results achieved using max pooling and the L2 norm across specific model layers. On the Stanford Natural Language Inference (SNLI) dataset, the model reached 90% accuracy and 86% F1-score, while on the MedNLI dataset, the highest F1-score recorded was 84%. This article provides insights into how different models and evaluation strategies can be effectively combined to improve the understanding and classification of sentence relationships in NLI tasks.

摘要

自然语言推理（NLI）是自然语言处理中的一项基础任务，专注于确定句子对之间的关系。在本文中，我们提出一种简单直接的方法，来评估各种基于Transformer的模型的有效性，例如来自Transformer的双向编码器表示（BERT）、生成式预训练Transformer（GPT）、稳健优化的BERT方法（RoBERTa）以及XLNet在为自然语言推理生成句子嵌入方面的有效性。我们使用不同的池化技术进行了全面的实验，并在每个模型的多个层中使用不同的范数来评估嵌入。我们的结果表明，池化策略、范数和模型层的选择对自然语言推理的性能有显著影响，在特定模型层使用最大池化和L2范数可取得最佳结果。在斯坦福自然语言推理（SNLI）数据集上，该模型达到了90%的准确率和86%的F1分数，而在医学自然语言推理（MedNLI）数据集上，记录的最高F1分数为84%。本文深入探讨了如何有效地结合不同的模型和评估策略，以提高自然语言推理任务中句子关系的理解和分类。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

意义的几何学：评估基于不同Transformer模型的句子嵌入用于自然语言推理

The geometry of meaning: evaluating sentence embeddings from diverse transformer-based models for natural language inference.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

意义的几何学：评估基于不同Transformer模型的句子嵌入用于自然语言推理

The geometry of meaning: evaluating sentence embeddings from diverse transformer-based models for natural language inference.

作者信息

机构信息

出版信息

相似文献

本文引用的文献