• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于引用链接和文本特征的混合自优化聚类模型以检测研究主题。

Hybrid self-optimized clustering model based on citation links and textual features to detect research topics.

作者信息

Yu Dejian, Wang Wanru, Zhang Shuai, Zhang Wenyu, Liu Rongyu

机构信息

School of Information, Zhejiang University of Finance and Economics, Hangzhou, Zhejiang, China.

出版信息

PLoS One. 2017 Oct 27;12(10):e0187164. doi: 10.1371/journal.pone.0187164. eCollection 2017.

DOI:10.1371/journal.pone.0187164
PMID:29077747
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5659815/
Abstract

The challenge of detecting research topics in a specific research field has attracted attention from researchers in the bibliometrics community. In this study, to solve two problems of clustering papers, i.e., the influence of different distributions of citation links and involved textual features on similarity computation, the authors propose a hybrid self-optimized clustering model to detect research topics by extending the hybrid clustering model to identify "core documents". First, the Amsler network, consisting of bibliographic coupling and co-citation links, is created to calculate the citation-based similarity based on the cosine angle of papers. Second, the cosine similarity is also used to compute the text-based similarity, which consists of the textual statistical and topological features. Then, the cosine angle of the linear combination of citation- and text-based similarity is considered as the hybrid similarity. Finally, the Louvain method is applied to cluster papers, and the terms based on term frequency are used to label clusters. To test the performance of the proposed model, a dataset related to the data envelopment analysis field is used for comparison and analysis of clustering results. Based on the benchmark built, different clustering methods with different citation links or textual features are compared according to evaluation measures. The results show that the proposed model can obtain reasonable and effective clustering results, and the research topics of data envelopment analysis field are also analyzed based on the proposed model. As different features are considered in the proposed model compared with previous hybrid clustering models, the proposed clustering model can provide inspiration for further studies on topic identification by other researchers.

摘要

在特定研究领域中检测研究主题的挑战已引起文献计量学界研究人员的关注。在本研究中,为了解决论文聚类的两个问题,即不同分布的引用链接和所涉及的文本特征对相似度计算的影响,作者提出了一种混合自优化聚类模型,通过扩展混合聚类模型来识别“核心文档”以检测研究主题。首先,创建由文献耦合和共被引链接组成的Amsler网络,基于论文的余弦角计算基于引用的相似度。其次,余弦相似度也用于计算基于文本的相似度,其由文本统计和拓扑特征组成。然后,将基于引用和基于文本的相似度的线性组合的余弦角视为混合相似度。最后,应用Louvain方法对论文进行聚类,并使用基于词频的术语对聚类进行标注。为了测试所提出模型的性能,使用与数据包络分析领域相关的数据集对聚类结果进行比较和分析。基于构建的基准,根据评估指标比较具有不同引用链接或文本特征的不同聚类方法。结果表明,所提出的模型能够获得合理有效的聚类结果,并且还基于所提出的模型对数据包络分析领域的研究主题进行了分析。与先前的混合聚类模型相比,由于在所提出的模型中考虑了不同的特征,所提出的聚类模型可以为其他研究人员进一步开展主题识别研究提供启发。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/479e/5659815/784214a976fe/pone.0187164.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/479e/5659815/a74e87a93332/pone.0187164.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/479e/5659815/e9b98ed5bdf9/pone.0187164.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/479e/5659815/38b8ad94503f/pone.0187164.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/479e/5659815/56a2f629d3f0/pone.0187164.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/479e/5659815/b6fbe8baa4f0/pone.0187164.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/479e/5659815/784214a976fe/pone.0187164.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/479e/5659815/a74e87a93332/pone.0187164.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/479e/5659815/e9b98ed5bdf9/pone.0187164.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/479e/5659815/38b8ad94503f/pone.0187164.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/479e/5659815/56a2f629d3f0/pone.0187164.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/479e/5659815/b6fbe8baa4f0/pone.0187164.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/479e/5659815/784214a976fe/pone.0187164.g006.jpg

相似文献

1
Hybrid self-optimized clustering model based on citation links and textual features to detect research topics.基于引用链接和文本特征的混合自优化聚类模型以检测研究主题。
PLoS One. 2017 Oct 27;12(10):e0187164. doi: 10.1371/journal.pone.0187164. eCollection 2017.
2
Active research fields in anesthesia: a document co-citation analysis of the anesthetic literature.麻醉学的活跃研究领域:麻醉学文献的文献共被引分析
Anesth Analg. 2008 May;106(5):1524-33, table of contents. doi: 10.1213/ane.0b013e31816d18a1.
3
Hybrid Methods of Bibliographic Coupling and Text Similarity Measurement for Biomedical Paper Recommendation.基于文献耦合和文本相似度测量的混合方法在生物医学文献推荐中的应用
Stud Health Technol Inform. 2022 Jun 6;290:287-291. doi: 10.3233/SHTI220080.
4
Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods.基于引用关系的科学出版物聚类:不同方法的系统比较
PLoS One. 2016 Apr 28;11(4):e0154404. doi: 10.1371/journal.pone.0154404. eCollection 2016.
5
Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches.对两百多万篇生物医学文献进行聚类:比较九种基于文本的相似度方法的准确性。
PLoS One. 2011 Mar 17;6(3):e18029. doi: 10.1371/journal.pone.0018029.
6
In-text citation's frequencies-based recommendations of relevant research papers.基于文中引用频率的相关研究论文推荐。
PeerJ Comput Sci. 2021 Jun 4;7:e524. doi: 10.7717/peerj-cs.524. eCollection 2021.
7
Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions.基于具有异质边缘分布的非参数混合模型的多重协同聚类
PLoS One. 2017 Oct 19;12(10):e0186566. doi: 10.1371/journal.pone.0186566. eCollection 2017.
8
Improving the text classification using clustering and a novel HMM to reduce the dimensionality.利用聚类和一种新颖的隐马尔可夫模型来降低维度,以改进文本分类。
Comput Methods Programs Biomed. 2016 Nov;136:119-30. doi: 10.1016/j.cmpb.2016.08.018. Epub 2016 Aug 26.
9
Towards semantically sensitive text clustering: a feature space modeling technology based on dimension extension.迈向语义敏感文本聚类:一种基于维度扩展的特征空间建模技术
PLoS One. 2015 Mar 20;10(3):e0117390. doi: 10.1371/journal.pone.0117390. eCollection 2015.
10
Using citation networks to evaluate the impact of text length on keyword extraction.利用引文网络评估文本长度对关键词提取的影响。
PLoS One. 2023 Nov 27;18(11):e0294500. doi: 10.1371/journal.pone.0294500. eCollection 2023.

引用本文的文献

1
Article-level classification of scientific publications: A comparison of deep learning, direct citation and bibliographic coupling.科学出版物的文章级别分类:深度学习、直接引文和文献耦合的比较。
PLoS One. 2021 May 11;16(5):e0251493. doi: 10.1371/journal.pone.0251493. eCollection 2021.
2
Mind the gap: Developments in autonomous driving research and the sustainability challenge.注意差距:自动驾驶研究的进展与可持续性挑战。
J Clean Prod. 2020 Dec 1;275:124087. doi: 10.1016/j.jclepro.2020.124087. Epub 2020 Sep 11.

本文引用的文献

1
Knowledge-leveraged transfer fuzzy -Means for texture image segmentation with self-adaptive cluster prototype matching.基于知识杠杆转移模糊均值的自适应聚类原型匹配纹理图像分割方法
Knowl Based Syst. 2017 Aug 15;130:33-50. doi: 10.1016/j.knosys.2017.05.018. Epub 2017 May 19.
2
Novel keyword co-occurrence network-based methods to foster systematic reviews of scientific literature.基于新型关键词共现网络的方法促进科学文献的系统综述。
PLoS One. 2017 Mar 22;12(3):e0172778. doi: 10.1371/journal.pone.0172778. eCollection 2017.
3
Passage-Based Bibliographic Coupling: An Inter-Article Similarity Measure for Biomedical Articles.
基于段落的文献耦合:一种用于生物医学文章的文章间相似性度量方法。
PLoS One. 2015 Oct 6;10(10):e0139245. doi: 10.1371/journal.pone.0139245. eCollection 2015.
4
A Complex Network Approach to Stylometry.一种用于文体学的复杂网络方法。
PLoS One. 2015 Aug 27;10(8):e0136076. doi: 10.1371/journal.pone.0136076. eCollection 2015.
5
Collaborative fuzzy clustering from multiple weighted views.多加权视图的协同模糊聚类。
IEEE Trans Cybern. 2015 Apr;45(4):688-701. doi: 10.1109/TCYB.2014.2334595. Epub 2014 Jul 23.
6
Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches.对两百多万篇生物医学文献进行聚类:比较九种基于文本的相似度方法的准确性。
PLoS One. 2011 Mar 17;6(3):e18029. doi: 10.1371/journal.pone.0018029.
7
Modularity and community structure in networks.网络中的模块化与群落结构。
Proc Natl Acad Sci U S A. 2006 Jun 6;103(23):8577-82. doi: 10.1073/pnas.0601602103. Epub 2006 May 24.
8
Fast algorithm for detecting community structure in networks.网络中社区结构检测的快速算法。
Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Jun;69(6 Pt 2):066133. doi: 10.1103/PhysRevE.69.066133. Epub 2004 Jun 18.