Suppr超能文献

基于引用链接和文本特征的混合自优化聚类模型以检测研究主题。

Hybrid self-optimized clustering model based on citation links and textual features to detect research topics.

作者信息

Yu Dejian, Wang Wanru, Zhang Shuai, Zhang Wenyu, Liu Rongyu

机构信息

School of Information, Zhejiang University of Finance and Economics, Hangzhou, Zhejiang, China.

出版信息

PLoS One. 2017 Oct 27;12(10):e0187164. doi: 10.1371/journal.pone.0187164. eCollection 2017.

Abstract

The challenge of detecting research topics in a specific research field has attracted attention from researchers in the bibliometrics community. In this study, to solve two problems of clustering papers, i.e., the influence of different distributions of citation links and involved textual features on similarity computation, the authors propose a hybrid self-optimized clustering model to detect research topics by extending the hybrid clustering model to identify "core documents". First, the Amsler network, consisting of bibliographic coupling and co-citation links, is created to calculate the citation-based similarity based on the cosine angle of papers. Second, the cosine similarity is also used to compute the text-based similarity, which consists of the textual statistical and topological features. Then, the cosine angle of the linear combination of citation- and text-based similarity is considered as the hybrid similarity. Finally, the Louvain method is applied to cluster papers, and the terms based on term frequency are used to label clusters. To test the performance of the proposed model, a dataset related to the data envelopment analysis field is used for comparison and analysis of clustering results. Based on the benchmark built, different clustering methods with different citation links or textual features are compared according to evaluation measures. The results show that the proposed model can obtain reasonable and effective clustering results, and the research topics of data envelopment analysis field are also analyzed based on the proposed model. As different features are considered in the proposed model compared with previous hybrid clustering models, the proposed clustering model can provide inspiration for further studies on topic identification by other researchers.

摘要

在特定研究领域中检测研究主题的挑战已引起文献计量学界研究人员的关注。在本研究中,为了解决论文聚类的两个问题,即不同分布的引用链接和所涉及的文本特征对相似度计算的影响,作者提出了一种混合自优化聚类模型,通过扩展混合聚类模型来识别“核心文档”以检测研究主题。首先,创建由文献耦合和共被引链接组成的Amsler网络,基于论文的余弦角计算基于引用的相似度。其次,余弦相似度也用于计算基于文本的相似度,其由文本统计和拓扑特征组成。然后,将基于引用和基于文本的相似度的线性组合的余弦角视为混合相似度。最后,应用Louvain方法对论文进行聚类,并使用基于词频的术语对聚类进行标注。为了测试所提出模型的性能,使用与数据包络分析领域相关的数据集对聚类结果进行比较和分析。基于构建的基准,根据评估指标比较具有不同引用链接或文本特征的不同聚类方法。结果表明,所提出的模型能够获得合理有效的聚类结果,并且还基于所提出的模型对数据包络分析领域的研究主题进行了分析。与先前的混合聚类模型相比,由于在所提出的模型中考虑了不同的特征,所提出的聚类模型可以为其他研究人员进一步开展主题识别研究提供启发。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/479e/5659815/a74e87a93332/pone.0187164.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验