Suppr超能文献

FNG-IE:一种用于从学术大数据中提取关键词的改进的基于图的方法。

FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data.

作者信息

Tahir Noman, Asif Muhammad, Ahmad Shahbaz, Malik Muhammad Sheraz Arshad, Aljuaid Hanan, Butt Muhammad Arif, Rehman Mobashar

机构信息

Department of Computer Science, National Textile University, Faisalabad, Punjab, Pakistan.

Department of Information Technology, Government College University, Faisalabad, Faisalabad, Punjab, Pakistan.

出版信息

PeerJ Comput Sci. 2021 Mar 11;7:e389. doi: 10.7717/peerj-cs.389. eCollection 2021.

Abstract

Keyword extraction is essential in determining influenced keywords from huge documents as the research repositories are becoming massive in volume day by day. The research community is drowning in data and starving for information. The keywords are the words that describe the theme of the whole document in a precise way by consisting of just a few words. Furthermore, many state-of-the-art approaches are available for keyword extraction from a huge collection of documents and are classified into three types, the statistical approaches, machine learning, and graph-based methods. The machine learning approaches require a large training dataset that needs to be developed manually by domain experts, which sometimes is difficult to produce while determining influenced keywords. However, this research focused on enhancing state-of-the-art graph-based methods to extract keywords when the training dataset is unavailable. This research first converted the handcrafted dataset, collected from impact factor journals into -grams combinations, ranging from unigram to pentagram and also enhanced traditional graph-based approaches. The experiment was conducted on a handcrafted dataset, and all methods were applied on it. Domain experts performed the user study to evaluate the results. The results were observed from every method and were evaluated with the user study using precision, recall and f-measure as evaluation matrices. The results showed that the proposed method (FNG-IE) performed well and scored near the machine learning approaches score.

摘要

随着研究知识库的规模日益庞大,关键词提取对于从海量文档中确定受影响的关键词至关重要。研究界正淹没在数据中,却极度渴求信息。关键词是用寥寥数语精确描述整个文档主题的词汇。此外,有许多先进方法可用于从大量文档集合中提取关键词,这些方法可分为三类:统计方法、机器学习方法和基于图的方法。机器学习方法需要由领域专家手动开发大量训练数据集,而在确定受影响的关键词时,有时很难生成这样的数据集。然而,本研究聚焦于在无训练数据集的情况下,改进基于图的先进方法来提取关键词。本研究首先将从影响因子期刊收集的手工制作数据集转换为从单字到五字组的词元组合,还改进了传统的基于图的方法。实验在一个手工制作的数据集上进行,所有方法都应用于该数据集。领域专家进行用户研究以评估结果。从每种方法观察结果,并使用精确率、召回率和F值作为评估指标,通过用户研究对结果进行评估。结果表明,所提出的方法(FNG - IE)表现良好,得分接近机器学习方法的得分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/68c4/7959634/80f28c98471b/peerj-cs-07-389-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验