Suppr超能文献

基于聚类分析的英文文本关键词提取算法研究。

Research on Keyword Extraction Algorithm in English Text Based on Cluster Analysis.

机构信息

School of Western Languages and Cultures, Harbin Normal University, Harbin 150025, China.

出版信息

Comput Intell Neurosci. 2022 Mar 28;2022:4293102. doi: 10.1155/2022/4293102. eCollection 2022.

Abstract

How to facilitate users to quickly and accurately search for the text information they need is a current research hotspot. Text clustering can improve the efficiency of information search and is an effective text retrieval method. Keyword extraction and cluster center point selection are key issues in text clustering research. Common keyword extraction algorithms can be divided into three categories: semantic-based algorithms, machine learning-based algorithms, and statistical model-based algorithms. There are three common methods for selecting cluster centers: randomly selecting the initial cluster center point, manually specifying the cluster center point, and selecting the cluster center point according to the similarity between the points to be clustered. The randomly selected initial cluster center points may contain "outliers," and the clustering results are locally optimal. Manually specifying the cluster center points will be very subjective because each person's understanding of the text set is different, and it is not suitable for the case of a large number of text sets. Selecting the cluster center points according to the similarity between the points to be clustered can make the selected cluster center points distributed in each class and be as close as possible to the class center points, but it takes a long time to calculate the cluster centers. Aiming at this problem, this paper proposes a keyword extraction algorithm based on cluster analysis. The results show that the algorithm does not rely on background knowledge bases, dictionaries, etc., and obtains statistical parameters and builds models through training. Experiments show that the keyword extraction algorithm has high accuracy and can quickly extract the subject content of an English translation.

摘要

如何使用户能够快速准确地搜索到他们需要的文本信息,是目前的研究热点。文本聚类可以提高信息搜索的效率,是一种有效的文本检索方法。关键词提取和聚类中心点选择是文本聚类研究的关键问题。常见的关键词提取算法可以分为基于语义的算法、基于机器学习的算法和基于统计模型的算法三大类。选择聚类中心的常用方法有三种:随机选择初始聚类中心点、手动指定聚类中心点和根据待聚类点之间的相似度选择聚类中心点。随机选择的初始聚类中心点可能包含“异常值”,聚类结果是局部最优的。手动指定聚类中心点会非常主观,因为每个人对文本集的理解都不同,而且不适合大量文本集的情况。根据待聚类点之间的相似度选择聚类中心点,可以使选择的聚类中心点分布在每个类中,并且尽可能接近类中心点,但是计算聚类中心需要很长时间。针对这个问题,本文提出了一种基于聚类分析的关键词提取算法。结果表明,该算法不依赖背景知识库、词典等,通过训练获取统计参数并建立模型。实验表明,该关键词提取算法具有较高的准确性,能够快速提取英文译文的主题内容。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72bf/8979710/dbb7aecce9bf/CIN2022-4293102.001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验