Zhu Junxing, Zhang Jiawei, Wu Quanyuan, Jia Yan, Zhou Bin, Wei Xiaokai, Yu Philip S
College of Computer, National University of Defense Technology, Changsha 410073, China.
Department of Computer Science, Florida State University, Tallahassee, FL 32306-4530, USA.
Sensors (Basel). 2017 Aug 3;17(8):1786. doi: 10.3390/s17081786.
Nowadays, people are usually involved in multiple heterogeneous social networks simultaneously. Discovering the anchor links between the accounts owned by the same users across different social networks is crucial for many important inter-network applications, e.g., cross-network link transfer and cross-network recommendation. Many different supervised models have been proposed to predict anchor links so far, but they are effective only when the labeled anchor links are abundant. However, in real scenarios, such a requirement can hardly be met and most anchor links are unlabeled, since manually labeling the inter-network anchor links is quite costly and tedious. To overcome such a problem and utilize the numerous unlabeled anchor links in model building, in this paper, we introduce the active learning based anchor link prediction problem. Different from the traditional active learning problems, due to the on anchor links, if an unlabeled anchor link a = ( u , v ) is identified as positive (i.e., existing), all the other unlabeled anchor links incident to account or account will be negative (i.e., non-existing) automatically. Viewed in such a perspective, asking for the labels of potential positive anchor links in the unlabeled set will be rewarding in the active anchor link prediction problem. Various novel anchor link information gain measures are defined in this paper, based on which several constraint active anchor link prediction methods are introduced. Extensive experiments have been done on real-world social network datasets to compare the performance of these methods with state-of-art anchor link prediction methods. The experimental results show that the proposed method can outperform other methods with significant advantages.
如今,人们通常同时参与多个异构社交网络。发现同一用户在不同社交网络中所拥有账户之间的锚定链接,对于许多重要的跨网络应用至关重要,例如跨网络链接转移和跨网络推荐。到目前为止,已经提出了许多不同的监督模型来预测锚定链接,但它们仅在有大量带标签的锚定链接时才有效。然而,在实际场景中,这样的要求很难得到满足,并且大多数锚定链接是未标记的,因为手动标记跨网络锚定链接成本很高且很繁琐。为了克服这个问题并在模型构建中利用大量未标记的锚定链接,在本文中,我们引入了基于主动学习的锚定链接预测问题。与传统的主动学习问题不同,由于锚定链接的特性,如果一个未标记的锚定链接a = (u, v)被识别为正(即存在),那么与账户u或账户v相关的所有其他未标记的锚定链接将自动为负(即不存在)。从这个角度来看,在未标记集合中询问潜在正锚定链接的标签在主动锚定链接预测问题中是有价值的。本文定义了各种新颖的锚定链接信息增益度量,并在此基础上引入了几种约束主动锚定链接预测方法。我们在真实世界的社交网络数据集上进行了广泛的实验,以将这些方法的性能与现有最先进的锚定链接预测方法进行比较。实验结果表明,所提出的方法能够以显著优势优于其他方法。