FNG-IE：一种用于从学术大数据中提取关键词的改进的基于图的方法。

FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data.

作者信息

Tahir Noman, Asif Muhammad, Ahmad Shahbaz, Malik Muhammad Sheraz Arshad, Aljuaid Hanan, Butt Muhammad Arif, Rehman Mobashar

机构信息

Department of Computer Science, National Textile University, Faisalabad, Punjab, Pakistan.

Department of Information Technology, Government College University, Faisalabad, Faisalabad, Punjab, Pakistan.

出版信息

PeerJ Comput Sci. 2021 Mar 11;7:e389. doi: 10.7717/peerj-cs.389. eCollection 2021.

DOI:10.7717/peerj-cs.389

PMID:33817035

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7959634/

Abstract

Keyword extraction is essential in determining influenced keywords from huge documents as the research repositories are becoming massive in volume day by day. The research community is drowning in data and starving for information. The keywords are the words that describe the theme of the whole document in a precise way by consisting of just a few words. Furthermore, many state-of-the-art approaches are available for keyword extraction from a huge collection of documents and are classified into three types, the statistical approaches, machine learning, and graph-based methods. The machine learning approaches require a large training dataset that needs to be developed manually by domain experts, which sometimes is difficult to produce while determining influenced keywords. However, this research focused on enhancing state-of-the-art graph-based methods to extract keywords when the training dataset is unavailable. This research first converted the handcrafted dataset, collected from impact factor journals into -grams combinations, ranging from unigram to pentagram and also enhanced traditional graph-based approaches. The experiment was conducted on a handcrafted dataset, and all methods were applied on it. Domain experts performed the user study to evaluate the results. The results were observed from every method and were evaluated with the user study using precision, recall and f-measure as evaluation matrices. The results showed that the proposed method (FNG-IE) performed well and scored near the machine learning approaches score.

摘要

随着研究知识库的规模日益庞大，关键词提取对于从海量文档中确定受影响的关键词至关重要。研究界正淹没在数据中，却极度渴求信息。关键词是用寥寥数语精确描述整个文档主题的词汇。此外，有许多先进方法可用于从大量文档集合中提取关键词，这些方法可分为三类：统计方法、机器学习方法和基于图的方法。机器学习方法需要由领域专家手动开发大量训练数据集，而在确定受影响的关键词时，有时很难生成这样的数据集。然而，本研究聚焦于在无训练数据集的情况下，改进基于图的先进方法来提取关键词。本研究首先将从影响因子期刊收集的手工制作数据集转换为从单字到五字组的词元组合，还改进了传统的基于图的方法。实验在一个手工制作的数据集上进行，所有方法都应用于该数据集。领域专家进行用户研究以评估结果。从每种方法观察结果，并使用精确率、召回率和F值作为评估指标，通过用户研究对结果进行评估。结果表明，所提出的方法（FNG - IE）表现良好，得分接近机器学习方法的得分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68c4/7959634/80f28c98471b/peerj-cs-07-389-g001.jpg

相似文献

FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data.FNG-IE：一种用于从学术大数据中提取关键词的改进的基于图的方法。

PeerJ Comput Sci. 2021 Mar 11;7:e389. doi: 10.7717/peerj-cs.389. eCollection 2021.

Word synonym relationships for text analysis: A graph-based approach.基于图的文本分析词同义词关系方法。

PLoS One. 2021 Jul 27;16(7):e0255127. doi: 10.1371/journal.pone.0255127. eCollection 2021.

Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph.新冠疫情研究的影响：一项使用机器学习和领域无关知识图谱预测有影响力学术文献的研究。

J Biomed Semantics. 2023 Nov 28;14(1):18. doi: 10.1186/s13326-023-00298-4.

Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification.基于分布式表示的专利分类专利关键词提取算法

Entropy (Basel). 2018 Feb 2;20(2):104. doi: 10.3390/e20020104.

Impact analysis of keyword extraction using contextual word embedding.基于上下文词嵌入的关键词提取影响分析

PeerJ Comput Sci. 2022 May 30;8:e967. doi: 10.7717/peerj-cs.967. eCollection 2022.

Beyond the Failure of Direct-Matching in Keyword Evaluation: A Sketch of a Graph Based Solution.关键词评估中直接匹配的失败之外：基于图的解决方案概述。

Front Artif Intell. 2022 Mar 24;5:801564. doi: 10.3389/frai.2022.801564. eCollection 2022.

Automated Classification of Radiology Reports for Acute Lung Injury: Comparison of Keyword and Machine Learning Based Natural Language Processing Approaches.急性肺损伤放射学报告的自动分类：基于关键词和机器学习的自然语言处理方法的比较

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2009 Nov;2009:314-319. doi: 10.1109/BIBMW.2009.5332081.

Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords.通过预测主导关键词从科学文章中提取蛋白质-蛋白质相互作用

Biomed Res Int. 2015;2015:928531. doi: 10.1155/2015/928531. Epub 2015 Dec 10.

The importance of the keyword-generation method in keyword mnemonics.关键词生成方法在关键词记忆术中的重要性。

Exp Psychol. 2004;51(2):125-31. doi: 10.1027/1618-3169.51.2.125.

PubMed Author-assigned Keyword Extraction (PubMedAKE) Benchmark.PubMed作者指定关键词提取（PubMedAKE）基准

Proc ACM Int Conf Inf Knowl Manag. 2022 Oct;2022:4470-4474. doi: 10.1145/3511808.3557675. Epub 2022 Oct 17.

引用本文的文献

Study of Efficacy of a Novel Formative Assessment Tool: Keywords Recall.一种新型形成性评估工具：关键词回忆的功效研究。

Cureus. 2024 Sep 21;16(9):e69881. doi: 10.7759/cureus.69881. eCollection 2024 Sep.

A study on the classification of stylistic and formal features in English based on corpus data testing.一项基于语料库数据测试的英语文体和形式特征分类研究。

PeerJ Comput Sci. 2023 Apr 25;9:e1297. doi: 10.7717/peerj-cs.1297. eCollection 2023.

Automatic computer science domain multiple-choice questions generation based on informative sentences.基于信息性句子的自动计算机科学领域多项选择题生成

PeerJ Comput Sci. 2022 Aug 16;8:e1010. doi: 10.7717/peerj-cs.1010. eCollection 2022.

Impact analysis of keyword extraction using contextual word embedding.基于上下文词嵌入的关键词提取影响分析

PeerJ Comput Sci. 2022 May 30;8:e967. doi: 10.7717/peerj-cs.967. eCollection 2022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

FNG-IE：一种用于从学术大数据中提取关键词的改进的基于图的方法。

FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献