基于聚类分析的英文文本关键词提取算法研究。

Research on Keyword Extraction Algorithm in English Text Based on Cluster Analysis.

机构信息

School of Western Languages and Cultures, Harbin Normal University, Harbin 150025, China.

出版信息

Comput Intell Neurosci. 2022 Mar 28;2022:4293102. doi: 10.1155/2022/4293102. eCollection 2022.

DOI:10.1155/2022/4293102

PMID:35387240

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8979710/

Abstract

How to facilitate users to quickly and accurately search for the text information they need is a current research hotspot. Text clustering can improve the efficiency of information search and is an effective text retrieval method. Keyword extraction and cluster center point selection are key issues in text clustering research. Common keyword extraction algorithms can be divided into three categories: semantic-based algorithms, machine learning-based algorithms, and statistical model-based algorithms. There are three common methods for selecting cluster centers: randomly selecting the initial cluster center point, manually specifying the cluster center point, and selecting the cluster center point according to the similarity between the points to be clustered. The randomly selected initial cluster center points may contain "outliers," and the clustering results are locally optimal. Manually specifying the cluster center points will be very subjective because each person's understanding of the text set is different, and it is not suitable for the case of a large number of text sets. Selecting the cluster center points according to the similarity between the points to be clustered can make the selected cluster center points distributed in each class and be as close as possible to the class center points, but it takes a long time to calculate the cluster centers. Aiming at this problem, this paper proposes a keyword extraction algorithm based on cluster analysis. The results show that the algorithm does not rely on background knowledge bases, dictionaries, etc., and obtains statistical parameters and builds models through training. Experiments show that the keyword extraction algorithm has high accuracy and can quickly extract the subject content of an English translation.

摘要

如何使用户能够快速准确地搜索到他们需要的文本信息，是目前的研究热点。文本聚类可以提高信息搜索的效率，是一种有效的文本检索方法。关键词提取和聚类中心点选择是文本聚类研究的关键问题。常见的关键词提取算法可以分为基于语义的算法、基于机器学习的算法和基于统计模型的算法三大类。选择聚类中心的常用方法有三种：随机选择初始聚类中心点、手动指定聚类中心点和根据待聚类点之间的相似度选择聚类中心点。随机选择的初始聚类中心点可能包含“异常值”，聚类结果是局部最优的。手动指定聚类中心点会非常主观，因为每个人对文本集的理解都不同，而且不适合大量文本集的情况。根据待聚类点之间的相似度选择聚类中心点，可以使选择的聚类中心点分布在每个类中，并且尽可能接近类中心点，但是计算聚类中心需要很长时间。针对这个问题，本文提出了一种基于聚类分析的关键词提取算法。结果表明，该算法不依赖背景知识库、词典等，通过训练获取统计参数并建立模型。实验表明，该关键词提取算法具有较高的准确性，能够快速提取英文译文的主题内容。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72bf/8979710/dbb7aecce9bf/CIN2022-4293102.001.jpg

相似文献

Research on Keyword Extraction Algorithm in English Text Based on Cluster Analysis.基于聚类分析的英文文本关键词提取算法研究。

Comput Intell Neurosci. 2022 Mar 28;2022:4293102. doi: 10.1155/2022/4293102. eCollection 2022.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Intelligent Sports Video Classification Based on Deep Neural Network (DNN) Algorithm and Transfer Learning.基于深度神经网络（DNN）算法和迁移学习的智能体育视频分类。

Comput Intell Neurosci. 2021 Nov 24;2021:1825273. doi: 10.1155/2021/1825273. eCollection 2021.

TextRank Keyword Extraction Algorithm Using Word Vector Clustering Based on Rough Data-Deduction.基于粗糙集数据约简的词向量聚类的 TextRank 关键词抽取算法

Comput Intell Neurosci. 2022 Jan 25;2022:5649994. doi: 10.1155/2022/5649994. eCollection 2022.

Research and Application of Clustering Algorithm for Text Big Data.文本大数据聚类算法的研究与应用

Comput Intell Neurosci. 2022 Jun 8;2022:7042778. doi: 10.1155/2022/7042778. eCollection 2022.

Construction of English and American Literature Corpus Based on Machine Learning Algorithm.基于机器学习算法的英美文学语料库构建。

Comput Intell Neurosci. 2022 Jun 2;2022:9773452. doi: 10.1155/2022/9773452. eCollection 2022.

Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification.基于分布式表示的专利分类专利关键词提取算法

Entropy (Basel). 2018 Feb 2;20(2):104. doi: 10.3390/e20020104.

Realization of English Instructional Resources Clusters Reconstruction System Using the Machine Learning Model.利用机器学习模型实现英语教学资源群重构系统。

Comput Intell Neurosci. 2022 Jul 9;2022:2838935. doi: 10.1155/2022/2838935. eCollection 2022.

Extraction of English Keyword Information Based on CAD Mesh Model.基于 CAD 网格模型的英文关键词信息提取。

Comput Intell Neurosci. 2022 Aug 20;2022:2391898. doi: 10.1155/2022/2391898. eCollection 2022.

A vector reconstruction based clustering algorithm particularly for large-scale text collection.基于向量重构的聚类算法，特别适用于大规模文本集。

Neural Netw. 2015 Mar;63:141-55. doi: 10.1016/j.neunet.2014.10.012. Epub 2014 Dec 9.

本文引用的文献

Application of deep learning in automatic detection of technical and tactical indicators of table tennis.深度学习在乒乓球技术战术指标自动检测中的应用。

PLoS One. 2021 Mar 9;16(3):e0245259. doi: 10.1371/journal.pone.0245259. eCollection 2021.

Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification.基于分布式表示的专利分类专利关键词提取算法

Entropy (Basel). 2018 Feb 2;20(2):104. doi: 10.3390/e20020104.

Experimental Games and Social Decision Making.实验游戏与社会决策

Annu Rev Psychol. 2021 Jan 4;72:415-438. doi: 10.1146/annurev-psych-081420-110718. Epub 2020 Oct 2.

Ball Tracking and Trajectory Prediction for Table-Tennis Robots.乒乓球机器人的球跟踪与轨迹预测。

Sensors (Basel). 2020 Jan 7;20(2):333. doi: 10.3390/s20020333.

The Challenges of Algorithm-Based HR Decision-Making for Personal Integrity.基于算法的人力资源决策对个人诚信的挑战

J Bus Ethics. 2019;160(2):377-392. doi: 10.1007/s10551-019-04204-w. Epub 2019 Jun 7.

Shared Decision Making and the Importance of Time.共同决策与时间的重要性。

JAMA. 2019 Jul 2;322(1):25-26. doi: 10.1001/jama.2019.3785.

Ethical Problems in Decision Making in the Neonatal ICU.新生儿重症监护病房决策中的伦理问题

N Engl J Med. 2018 Nov 8;379(19):1851-1860. doi: 10.1056/NEJMra1801063.

Interventions for increasing the use of shared decision making by healthcare professionals.提高医疗保健专业人员共同决策使用率的干预措施。

Cochrane Database Syst Rev. 2018 Jul 19;7(7):CD006732. doi: 10.1002/14651858.CD006732.pub4.

Anxiety, Depression, and Decision Making: A Computational Perspective.焦虑、抑郁与决策：计算视角

Annu Rev Neurosci. 2018 Jul 8;41:371-388. doi: 10.1146/annurev-neuro-080317-062007. Epub 2018 Apr 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于聚类分析的英文文本关键词提取算法研究。

Research on Keyword Extraction Algorithm in English Text Based on Cluster Analysis.

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献