Suppr超能文献

确定统计多关系学习中的潜在因子数量

Determining the Number of Latent Factors in Statistical Multi-Relational Learning.

作者信息

Shi Chengchun, Lu Wenbin, Song Rui

机构信息

Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA.

出版信息

J Mach Learn Res. 2019;20.

Abstract

Statistical relational learning is primarily concerned with learning and inferring relationships between entities in large-scale knowledge graphs. Nickel et al. (2011) proposed a RESCAL tensor factorization model for statistical relational learning, which achieves better or at least comparable results on common benchmark data sets when compared to other state-of-the-art methods. Given a positive integer , RESCAL computes an -dimensional latent vector for each entity. The latent factors can be further used for solving relational learning tasks, such as collective classification, collective entity resolution and link-based clustering. The focus of this paper is to determine the number of latent factors in the RESCAL model. Due to the structure of the RESCAL model, its log-likelihood function is not concave. As a result, the corresponding maximum likelihood estimators (MLEs) may not be consistent. Nonetheless, we design a specific pseudometric, prove the consistency of the MLEs under this pseudometric and establish its rate of convergence. Based on these results, we propose a general class of information criteria and prove their model selection consistencies when the number of relations is either bounded or diverges at a proper rate of the number of entities. Simulations and real data examples show that our proposed information criteria have good finite sample properties.

摘要

统计关系学习主要关注大规模知识图谱中实体之间关系的学习与推断。Nickel等人(2011年)提出了一种用于统计关系学习的RESCAL张量分解模型,与其他现有最先进方法相比,该模型在常见基准数据集上取得了更好或至少相当的结果。给定一个正整数,RESCAL为每个实体计算一个 维的潜在向量。这些潜在因子可进一步用于解决关系学习任务,如集体分类、集体实体解析和基于链接的聚类。本文的重点是确定RESCAL模型中潜在因子的数量。由于RESCAL模型的结构,其对数似然函数不是凹函数。因此,相应的最大似然估计器(MLE)可能不一致。尽管如此,我们设计了一种特定的伪度量,证明了在此伪度量下MLE的一致性,并确定了其收敛速度。基于这些结果,我们提出了一类通用的信息准则,并证明了当关系数量有界或以实体数量的适当速率发散时,它们在模型选择上的一致性。模拟和实际数据示例表明,我们提出的信息准则具有良好的有限样本性质。

相似文献

3
A Novel Tensor Learning Model for Joint Relational Triplet Extraction.一种用于联合关系三元组提取的新型张量学习模型。
IEEE Trans Cybern. 2024 Apr;54(4):2483-2494. doi: 10.1109/TCYB.2023.3265851. Epub 2024 Mar 18.
7
Text-Graph Enhanced Knowledge Graph Representation Learning.文本-图增强的知识图谱表示学习
Front Artif Intell. 2021 Aug 17;4:697856. doi: 10.3389/frai.2021.697856. eCollection 2021.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验