Ma Yingying, Wu Youlong, Lu Chengqiang
School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China.
Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China.
Entropy (Basel). 2020 Apr 7;22(4):416. doi: 10.3390/e22040416.
Name ambiguity, due to the fact that many people share an identical name, often deteriorates the performance of information integration, document retrieval and web search. In academic data analysis, author name ambiguity usually decreases the analysis performance. To solve this problem, an author name disambiguation task is designed to divide documents related to an author name reference into several parts and each part is associated with a real-life person. Existing methods usually use either attributes of documents or relationships between documents and co-authors. However, methods of feature extraction using attributes cause inflexibility of models while solutions based on relationship graph network ignore the information contained in the features. In this paper, we propose a novel name disambiguation model based on representation learning which incorporates attributes and relationships. Experiments on a public real dataset demonstrate the effectiveness of our model and experimental results demonstrate that our solution is superior to several state-of-the-art graph-based methods. We also increase the interpretability of our method through information theory and show that the analysis could be helpful for model selection and training progress.
由于许多人共用相同的名字,姓名歧义常常会降低信息整合、文档检索和网络搜索的性能。在学术数据分析中,作者姓名歧义通常会降低分析性能。为了解决这个问题,设计了一项作者姓名消歧任务,将与作者姓名引用相关的文档分成几个部分,每个部分都与一个真实的人相关联。现有方法通常使用文档的属性或文档与共同作者之间的关系。然而,使用属性进行特征提取的方法会导致模型缺乏灵活性,而基于关系图网络的解决方案则忽略了特征中包含的信息。在本文中,我们提出了一种基于表示学习的新颖姓名消歧模型,该模型结合了属性和关系。在一个公共真实数据集上的实验证明了我们模型的有效性,实验结果表明我们的解决方案优于几种基于图的最新方法。我们还通过信息论提高了我们方法的可解释性,并表明该分析有助于模型选择和训练过程。