确定统计多关系学习中的潜在因子数量

Determining the Number of Latent Factors in Statistical Multi-Relational Learning.

作者信息

Shi Chengchun, Lu Wenbin, Song Rui

机构信息

Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA.

出版信息

J Mach Learn Res. 2019;20.

PMID:31983896

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6980192/

Abstract

Statistical relational learning is primarily concerned with learning and inferring relationships between entities in large-scale knowledge graphs. Nickel et al. (2011) proposed a RESCAL tensor factorization model for statistical relational learning, which achieves better or at least comparable results on common benchmark data sets when compared to other state-of-the-art methods. Given a positive integer , RESCAL computes an -dimensional latent vector for each entity. The latent factors can be further used for solving relational learning tasks, such as collective classification, collective entity resolution and link-based clustering. The focus of this paper is to determine the number of latent factors in the RESCAL model. Due to the structure of the RESCAL model, its log-likelihood function is not concave. As a result, the corresponding maximum likelihood estimators (MLEs) may not be consistent. Nonetheless, we design a specific pseudometric, prove the consistency of the MLEs under this pseudometric and establish its rate of convergence. Based on these results, we propose a general class of information criteria and prove their model selection consistencies when the number of relations is either bounded or diverges at a proper rate of the number of entities. Simulations and real data examples show that our proposed information criteria have good finite sample properties.

摘要

统计关系学习主要关注大规模知识图谱中实体之间关系的学习与推断。Nickel等人（2011年）提出了一种用于统计关系学习的RESCAL张量分解模型，与其他现有最先进方法相比，该模型在常见基准数据集上取得了更好或至少相当的结果。给定一个正整数，RESCAL为每个实体计算一个维的潜在向量。这些潜在因子可进一步用于解决关系学习任务，如集体分类、集体实体解析和基于链接的聚类。本文的重点是确定RESCAL模型中潜在因子的数量。由于RESCAL模型的结构，其对数似然函数不是凹函数。因此，相应的最大似然估计器（MLE）可能不一致。尽管如此，我们设计了一种特定的伪度量，证明了在此伪度量下MLE的一致性，并确定了其收敛速度。基于这些结果，我们提出了一类通用的信息准则，并证明了当关系数量有界或以实体数量的适当速率发散时，它们在模型选择上的一致性。模拟和实际数据示例表明，我们提出的信息准则具有良好的有限样本性质。

相似文献

Determining the Number of Latent Factors in Statistical Multi-Relational Learning.确定统计多关系学习中的潜在因子数量

J Mach Learn Res. 2019;20.

Bridging Weighted Rules and Graph Random Walks for Statistical Relational Models.用于统计关系模型的桥接加权规则与图随机游走

Front Robot AI. 2018 Feb 19;5:8. doi: 10.3389/frobt.2018.00008. eCollection 2018.

A Novel Tensor Learning Model for Joint Relational Triplet Extraction.一种用于联合关系三元组提取的新型张量学习模型。

IEEE Trans Cybern. 2024 Apr;54(4):2483-2494. doi: 10.1109/TCYB.2023.3265851. Epub 2024 Mar 18.

Leveraging Multi-source knowledge for Chinese clinical named entity recognition via relational graph convolutional network.基于关系图卷积网络的多源知识融合的中文临床命名实体识别。

J Biomed Inform. 2022 Apr;128:104035. doi: 10.1016/j.jbi.2022.104035. Epub 2022 Feb 23.

GLOBAL RATES OF CONVERGENCE OF THE MLES OF LOG-CONCAVE AND -CONCAVE DENSITIES.对数凹密度和凹密度极大似然估计的全局收敛速率

Ann Stat. 2016;44(3):954-981. doi: 10.1214/15-AOS1394. Epub 2016 Apr 11.

Protein functional properties prediction in sparsely-label PPI networks through regularized non-negative matrix factorization.通过正则化非负矩阵分解在稀疏标记的蛋白质-蛋白质相互作用网络中预测蛋白质功能特性

BMC Syst Biol. 2015;9 Suppl 1(Suppl 1):S9. doi: 10.1186/1752-0509-9-S1-S9. Epub 2015 Jan 21.

Text-Graph Enhanced Knowledge Graph Representation Learning.文本-图增强的知识图谱表示学习

Front Artif Intell. 2021 Aug 17;4:697856. doi: 10.3389/frai.2021.697856. eCollection 2021.

Fast Inference for the Latent Space Network Model Using a Case-Control Approximate Likelihood.使用病例对照近似似然法对潜在空间网络模型进行快速推理。

J Comput Graph Stat. 2012;21(4):901-919. doi: 10.1080/10618600.2012.679240. Epub 2012 Apr 4.

A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces.发散模型空间中支持向量机的一致性信息准则

J Mach Learn Res. 2016;17(16):1-26.

High-quality gene/disease embedding in a multi-relational heterogeneous graph after a joint matrix/tensor decomposition.在联合矩阵/张量分解后，在多关系异构图中实现高质量的基因/疾病嵌入。

J Biomed Inform. 2022 Feb;126:103973. doi: 10.1016/j.jbi.2021.103973. Epub 2022 Jan 4.

本文引用的文献

Bayesian Conditional Tensor Factorizations for High-Dimensional Classification.用于高维分类的贝叶斯条件张量分解

J Am Stat Assoc. 2016;111(514):656-669. doi: 10.1080/01621459.2015.1029129. Epub 2016 Aug 18.

A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces.发散模型空间中支持向量机的一致性信息准则

J Mach Learn Res. 2016;17(16):1-26.

Some mathematical notes on three-mode factor analysis.关于三模式因子分析的一些数学注释。

Psychometrika. 1966 Sep;31(3):279-311. doi: 10.1007/BF02289464.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。