• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

确定统计多关系学习中的潜在因子数量

Determining the Number of Latent Factors in Statistical Multi-Relational Learning.

作者信息

Shi Chengchun, Lu Wenbin, Song Rui

机构信息

Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA.

出版信息

J Mach Learn Res. 2019;20.

PMID:31983896
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6980192/
Abstract

Statistical relational learning is primarily concerned with learning and inferring relationships between entities in large-scale knowledge graphs. Nickel et al. (2011) proposed a RESCAL tensor factorization model for statistical relational learning, which achieves better or at least comparable results on common benchmark data sets when compared to other state-of-the-art methods. Given a positive integer , RESCAL computes an -dimensional latent vector for each entity. The latent factors can be further used for solving relational learning tasks, such as collective classification, collective entity resolution and link-based clustering. The focus of this paper is to determine the number of latent factors in the RESCAL model. Due to the structure of the RESCAL model, its log-likelihood function is not concave. As a result, the corresponding maximum likelihood estimators (MLEs) may not be consistent. Nonetheless, we design a specific pseudometric, prove the consistency of the MLEs under this pseudometric and establish its rate of convergence. Based on these results, we propose a general class of information criteria and prove their model selection consistencies when the number of relations is either bounded or diverges at a proper rate of the number of entities. Simulations and real data examples show that our proposed information criteria have good finite sample properties.

摘要

统计关系学习主要关注大规模知识图谱中实体之间关系的学习与推断。Nickel等人(2011年)提出了一种用于统计关系学习的RESCAL张量分解模型,与其他现有最先进方法相比,该模型在常见基准数据集上取得了更好或至少相当的结果。给定一个正整数,RESCAL为每个实体计算一个 维的潜在向量。这些潜在因子可进一步用于解决关系学习任务,如集体分类、集体实体解析和基于链接的聚类。本文的重点是确定RESCAL模型中潜在因子的数量。由于RESCAL模型的结构,其对数似然函数不是凹函数。因此,相应的最大似然估计器(MLE)可能不一致。尽管如此,我们设计了一种特定的伪度量,证明了在此伪度量下MLE的一致性,并确定了其收敛速度。基于这些结果,我们提出了一类通用的信息准则,并证明了当关系数量有界或以实体数量的适当速率发散时,它们在模型选择上的一致性。模拟和实际数据示例表明,我们提出的信息准则具有良好的有限样本性质。

相似文献

1
Determining the Number of Latent Factors in Statistical Multi-Relational Learning.确定统计多关系学习中的潜在因子数量
J Mach Learn Res. 2019;20.
2
Bridging Weighted Rules and Graph Random Walks for Statistical Relational Models.用于统计关系模型的桥接加权规则与图随机游走
Front Robot AI. 2018 Feb 19;5:8. doi: 10.3389/frobt.2018.00008. eCollection 2018.
3
A Novel Tensor Learning Model for Joint Relational Triplet Extraction.一种用于联合关系三元组提取的新型张量学习模型。
IEEE Trans Cybern. 2024 Apr;54(4):2483-2494. doi: 10.1109/TCYB.2023.3265851. Epub 2024 Mar 18.
4
Leveraging Multi-source knowledge for Chinese clinical named entity recognition via relational graph convolutional network.基于关系图卷积网络的多源知识融合的中文临床命名实体识别。
J Biomed Inform. 2022 Apr;128:104035. doi: 10.1016/j.jbi.2022.104035. Epub 2022 Feb 23.
5
GLOBAL RATES OF CONVERGENCE OF THE MLES OF LOG-CONCAVE AND -CONCAVE DENSITIES.对数凹密度和凹密度极大似然估计的全局收敛速率
Ann Stat. 2016;44(3):954-981. doi: 10.1214/15-AOS1394. Epub 2016 Apr 11.
6
Protein functional properties prediction in sparsely-label PPI networks through regularized non-negative matrix factorization.通过正则化非负矩阵分解在稀疏标记的蛋白质-蛋白质相互作用网络中预测蛋白质功能特性
BMC Syst Biol. 2015;9 Suppl 1(Suppl 1):S9. doi: 10.1186/1752-0509-9-S1-S9. Epub 2015 Jan 21.
7
Text-Graph Enhanced Knowledge Graph Representation Learning.文本-图增强的知识图谱表示学习
Front Artif Intell. 2021 Aug 17;4:697856. doi: 10.3389/frai.2021.697856. eCollection 2021.
8
Fast Inference for the Latent Space Network Model Using a Case-Control Approximate Likelihood.使用病例对照近似似然法对潜在空间网络模型进行快速推理。
J Comput Graph Stat. 2012;21(4):901-919. doi: 10.1080/10618600.2012.679240. Epub 2012 Apr 4.
9
A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces.发散模型空间中支持向量机的一致性信息准则
J Mach Learn Res. 2016;17(16):1-26.
10
High-quality gene/disease embedding in a multi-relational heterogeneous graph after a joint matrix/tensor decomposition.在联合矩阵/张量分解后,在多关系异构图中实现高质量的基因/疾病嵌入。
J Biomed Inform. 2022 Feb;126:103973. doi: 10.1016/j.jbi.2021.103973. Epub 2022 Jan 4.

本文引用的文献

1
Bayesian Conditional Tensor Factorizations for High-Dimensional Classification.用于高维分类的贝叶斯条件张量分解
J Am Stat Assoc. 2016;111(514):656-669. doi: 10.1080/01621459.2015.1029129. Epub 2016 Aug 18.
2
A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces.发散模型空间中支持向量机的一致性信息准则
J Mach Learn Res. 2016;17(16):1-26.
3
Some mathematical notes on three-mode factor analysis.关于三模式因子分析的一些数学注释。
Psychometrika. 1966 Sep;31(3):279-311. doi: 10.1007/BF02289464.