Suppr超能文献

NELasso:新闻文章中命名实体关系特征的群组稀疏建模

NELasso: Group-Sparse Modeling for Characterizing Relations Among Named Entities in News Articles.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2017 Oct;39(10):2000-2014. doi: 10.1109/TPAMI.2016.2632117. Epub 2016 Nov 23.

Abstract

Named entities such as people, locations, and organizations play a vital role in characterizing online content. They often reflect information of interest and are frequently used in search queries. Although named entities can be detected reliably from textual content, extracting relations among them is more challenging, yet useful in various applications (e.g., news recommending systems). In this paper, we present a novel model and system for learning semantic relations among named entities from collections of news articles. We model each named entity occurrence with sparse structured logistic regression, and consider the words (predictors) to be grouped based on background semantics. This sparse group LASSO approach forces the weights of word groups that do not influence the prediction towards zero. The resulting sparse structure is utilized for defining the type and strength of relations. Our unsupervised system yields a named entities' network where each relation is typed, quantified, and characterized in context. These relations are the key to understanding news material over time and customizing newsfeeds for readers. Extensive evaluation of our system on articles from TIME magazine and BBC News shows that the learned relations correlate with static semantic relatedness measures like WLM, and capture the evolving relationships among named entities over time.

摘要

命名实体(如人、地点和组织)在描述在线内容方面起着至关重要的作用。它们通常反映了感兴趣的信息,并且经常在搜索查询中使用。虽然可以从文本内容中可靠地检测到命名实体,但提取它们之间的关系更具挑战性,但在各种应用中(例如新闻推荐系统)非常有用。在本文中,我们提出了一种从新闻文章集合中学习命名实体之间语义关系的新模型和系统。我们使用稀疏结构逻辑回归对每个命名实体出现进行建模,并考虑根据背景语义对单词(预测器)进行分组。这种稀疏组 LASSO 方法迫使不影响预测的单词组的权重趋向于零。由此产生的稀疏结构用于定义关系的类型和强度。我们的无监督系统生成了一个命名实体网络,其中每个关系都被分类、量化,并在上下文中进行描述。这些关系是理解新闻材料随时间变化并为读者定制新闻源的关键。我们的系统对来自时代杂志和英国广播公司新闻的文章进行了广泛的评估,结果表明,所学习的关系与静态语义相似性度量(如 WLM)相关,并且能够捕获命名实体随时间的演变关系。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验