Suppr超能文献

用于生成意大利语概念复杂描述的领域嵌入。

Domain embeddings for generating complex descriptions of concepts in Italian language.

作者信息

Maisto Alessandro

机构信息

Department of Politics and Communication Science, University of Salerno, via Giovanni Paolo II, 132, 84084, Fisciano, SA, Italy.

出版信息

Cogn Process. 2025 Feb;26(1):91-120. doi: 10.1007/s10339-024-01234-9. Epub 2024 Oct 21.

Abstract

In this work, we propose a Distributional Semantic resource enriched with linguistic and lexical information extracted from electronic dictionaries. This resource is designed to bridge the gap between the continuous semantic values represented by distributional vectors and the discrete descriptions provided by general semantics theory. Recently, many researchers have focused on the connection between embeddings and a comprehensive theory of semantics and meaning. This often involves translating the representation of word meanings in Distributional Models into a set of discrete, manually constructed properties, such as semantic primitives or features, using neural decoding techniques. Our approach introduces an alternative strategy based on linguistic data. We have developed a collection of domain-specific co-occurrence matrices derived from two sources: a list of Italian nouns classified into four semantic traits and 20 concrete noun sub-categories and Italian verbs classified by their semantic classes. In these matrices, the co-occurrence values for each word are calculated exclusively with a defined set of words relevant to a particular lexical domain. The resource includes 21 domain-specific matrices, one comprehensive matrix, and a Graphical User Interface. Our model facilitates the generation of reasoned semantic descriptions of concepts by selecting matrices directly associated with concrete conceptual knowledge, such as a matrix based on location nouns and the concept of animal habitats. We assessed the utility of the resource through two experiments, achieving promising outcomes in both the automatic classification of animal nouns and the extraction of animal features.

摘要

在这项工作中,我们提出了一种分布语义资源,它通过从电子词典中提取的语言和词汇信息得以丰富。该资源旨在弥合分布向量所表示的连续语义值与一般语义理论所提供的离散描述之间的差距。最近,许多研究人员专注于嵌入与全面的语义和意义理论之间的联系。这通常涉及使用神经解码技术将分布模型中词意义的表示转化为一组离散的、人工构建的属性,比如语义原语或特征。我们的方法引入了一种基于语言数据的替代策略。我们开发了一组特定领域的共现矩阵,其来源于两个来源:一份分为四个语义特征和20个具体名词子类别的意大利名词列表,以及按语义类别分类的意大利动词。在这些矩阵中,每个词的共现值仅与一组特定的、与特定词汇领域相关的词进行计算。该资源包括21个特定领域的矩阵、一个综合矩阵以及一个图形用户界面。我们的模型通过选择与具体概念知识直接相关的矩阵,比如基于地点名词和动物栖息地概念的矩阵,来促进对概念进行合理的语义描述。我们通过两个实验评估了该资源的效用,在动物名词的自动分类和动物特征提取方面均取得了有前景的成果。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验