• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

FNG-IE:一种用于从学术大数据中提取关键词的改进的基于图的方法。

FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data.

作者信息

Tahir Noman, Asif Muhammad, Ahmad Shahbaz, Malik Muhammad Sheraz Arshad, Aljuaid Hanan, Butt Muhammad Arif, Rehman Mobashar

机构信息

Department of Computer Science, National Textile University, Faisalabad, Punjab, Pakistan.

Department of Information Technology, Government College University, Faisalabad, Faisalabad, Punjab, Pakistan.

出版信息

PeerJ Comput Sci. 2021 Mar 11;7:e389. doi: 10.7717/peerj-cs.389. eCollection 2021.

DOI:10.7717/peerj-cs.389
PMID:33817035
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7959634/
Abstract

Keyword extraction is essential in determining influenced keywords from huge documents as the research repositories are becoming massive in volume day by day. The research community is drowning in data and starving for information. The keywords are the words that describe the theme of the whole document in a precise way by consisting of just a few words. Furthermore, many state-of-the-art approaches are available for keyword extraction from a huge collection of documents and are classified into three types, the statistical approaches, machine learning, and graph-based methods. The machine learning approaches require a large training dataset that needs to be developed manually by domain experts, which sometimes is difficult to produce while determining influenced keywords. However, this research focused on enhancing state-of-the-art graph-based methods to extract keywords when the training dataset is unavailable. This research first converted the handcrafted dataset, collected from impact factor journals into -grams combinations, ranging from unigram to pentagram and also enhanced traditional graph-based approaches. The experiment was conducted on a handcrafted dataset, and all methods were applied on it. Domain experts performed the user study to evaluate the results. The results were observed from every method and were evaluated with the user study using precision, recall and f-measure as evaluation matrices. The results showed that the proposed method (FNG-IE) performed well and scored near the machine learning approaches score.

摘要

随着研究知识库的规模日益庞大,关键词提取对于从海量文档中确定受影响的关键词至关重要。研究界正淹没在数据中,却极度渴求信息。关键词是用寥寥数语精确描述整个文档主题的词汇。此外,有许多先进方法可用于从大量文档集合中提取关键词,这些方法可分为三类:统计方法、机器学习方法和基于图的方法。机器学习方法需要由领域专家手动开发大量训练数据集,而在确定受影响的关键词时,有时很难生成这样的数据集。然而,本研究聚焦于在无训练数据集的情况下,改进基于图的先进方法来提取关键词。本研究首先将从影响因子期刊收集的手工制作数据集转换为从单字到五字组的词元组合,还改进了传统的基于图的方法。实验在一个手工制作的数据集上进行,所有方法都应用于该数据集。领域专家进行用户研究以评估结果。从每种方法观察结果,并使用精确率、召回率和F值作为评估指标,通过用户研究对结果进行评估。结果表明,所提出的方法(FNG - IE)表现良好,得分接近机器学习方法的得分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68c4/7959634/84ede99c3804/peerj-cs-07-389-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68c4/7959634/80f28c98471b/peerj-cs-07-389-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68c4/7959634/b39416f90c1a/peerj-cs-07-389-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68c4/7959634/c7a2a6003105/peerj-cs-07-389-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68c4/7959634/551dfc353d81/peerj-cs-07-389-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68c4/7959634/84ede99c3804/peerj-cs-07-389-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68c4/7959634/80f28c98471b/peerj-cs-07-389-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68c4/7959634/b39416f90c1a/peerj-cs-07-389-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68c4/7959634/c7a2a6003105/peerj-cs-07-389-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68c4/7959634/551dfc353d81/peerj-cs-07-389-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68c4/7959634/84ede99c3804/peerj-cs-07-389-g008.jpg

相似文献

1
FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data.FNG-IE:一种用于从学术大数据中提取关键词的改进的基于图的方法。
PeerJ Comput Sci. 2021 Mar 11;7:e389. doi: 10.7717/peerj-cs.389. eCollection 2021.
2
Word synonym relationships for text analysis: A graph-based approach.基于图的文本分析词同义词关系方法。
PLoS One. 2021 Jul 27;16(7):e0255127. doi: 10.1371/journal.pone.0255127. eCollection 2021.
3
Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph.新冠疫情研究的影响:一项使用机器学习和领域无关知识图谱预测有影响力学术文献的研究。
J Biomed Semantics. 2023 Nov 28;14(1):18. doi: 10.1186/s13326-023-00298-4.
4
Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification.基于分布式表示的专利分类专利关键词提取算法
Entropy (Basel). 2018 Feb 2;20(2):104. doi: 10.3390/e20020104.
5
Impact analysis of keyword extraction using contextual word embedding.基于上下文词嵌入的关键词提取影响分析
PeerJ Comput Sci. 2022 May 30;8:e967. doi: 10.7717/peerj-cs.967. eCollection 2022.
6
Beyond the Failure of Direct-Matching in Keyword Evaluation: A Sketch of a Graph Based Solution.关键词评估中直接匹配的失败之外:基于图的解决方案概述。
Front Artif Intell. 2022 Mar 24;5:801564. doi: 10.3389/frai.2022.801564. eCollection 2022.
7
Automated Classification of Radiology Reports for Acute Lung Injury: Comparison of Keyword and Machine Learning Based Natural Language Processing Approaches.急性肺损伤放射学报告的自动分类:基于关键词和机器学习的自然语言处理方法的比较
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2009 Nov;2009:314-319. doi: 10.1109/BIBMW.2009.5332081.
8
Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords.通过预测主导关键词从科学文章中提取蛋白质-蛋白质相互作用
Biomed Res Int. 2015;2015:928531. doi: 10.1155/2015/928531. Epub 2015 Dec 10.
9
The importance of the keyword-generation method in keyword mnemonics.关键词生成方法在关键词记忆术中的重要性。
Exp Psychol. 2004;51(2):125-31. doi: 10.1027/1618-3169.51.2.125.
10
PubMed Author-assigned Keyword Extraction (PubMedAKE) Benchmark.PubMed作者指定关键词提取(PubMedAKE)基准
Proc ACM Int Conf Inf Knowl Manag. 2022 Oct;2022:4470-4474. doi: 10.1145/3511808.3557675. Epub 2022 Oct 17.

引用本文的文献

1
Study of Efficacy of a Novel Formative Assessment Tool: Keywords Recall.一种新型形成性评估工具:关键词回忆的功效研究。
Cureus. 2024 Sep 21;16(9):e69881. doi: 10.7759/cureus.69881. eCollection 2024 Sep.
2
A study on the classification of stylistic and formal features in English based on corpus data testing.一项基于语料库数据测试的英语文体和形式特征分类研究。
PeerJ Comput Sci. 2023 Apr 25;9:e1297. doi: 10.7717/peerj-cs.1297. eCollection 2023.
3
Automatic computer science domain multiple-choice questions generation based on informative sentences.
基于信息性句子的自动计算机科学领域多项选择题生成
PeerJ Comput Sci. 2022 Aug 16;8:e1010. doi: 10.7717/peerj-cs.1010. eCollection 2022.
4
Impact analysis of keyword extraction using contextual word embedding.基于上下文词嵌入的关键词提取影响分析
PeerJ Comput Sci. 2022 May 30;8:e967. doi: 10.7717/peerj-cs.967. eCollection 2022.