Zeng Qingxiang
College of Humanities and Media, Hubei University of Science and Technology, Xianning, Hubei, China.
PeerJ Comput Sci. 2024 Aug 1;10:e2213. doi: 10.7717/peerj-cs.2213. eCollection 2024.
Traditional methods may be inefficient when processing large-scale data in the field of text mining, often struggling to identify and cluster relevant information accurately and efficiently. Additionally, capturing nuanced sentiment and emotional context within news text is challenging with conventional techniques. To address these issues, this article introduces an improved bidirectional-Kmeans-long short-term memory network-convolutional neural network (BiK-LSTM-CNN) model that incorporates emotional semantic analysis for high-dimensional news text visual extraction and media hotspot mining. The BiK-LSTM-CNN model comprises four modules: news text preprocessing, news text clustering, sentiment semantic analysis, and the BiK-LSTM-CNN model itself. By combining these components, the model effectively identifies common features within the input data, clusters similar news articles, and accurately analyzes the emotional semantics of the text. This comprehensive approach enhances both the accuracy and efficiency of visual extraction and hotspot mining. Experimental results demonstrate that compared to models such as Transformer, AdvLSTM, and NewRNN, BiK-LSTM-CNN achieves improvements in macro accuracy by 0.50%, 0.91%, and 1.34%, respectively. Similarly, macro recall rates increase by 0.51%, 1.24%, and 1.26%, while macro F1 scores improve by 0.52%, 1.23%, and 1.92%. Additionally, the BiK-LSTM-CNN model shows significant improvements in time efficiency, further establishing its potential as a more effective approach for processing and analyzing large-scale text data.
在文本挖掘领域处理大规模数据时,传统方法可能效率低下,往往难以准确、高效地识别和聚类相关信息。此外,使用传统技术捕捉新闻文本中的细微情感和情感背景具有挑战性。为了解决这些问题,本文介绍了一种改进的双向K均值-长短期记忆网络-卷积神经网络(BiK-LSTM-CNN)模型,该模型结合了情感语义分析,用于高维新闻文本视觉提取和媒体热点挖掘。BiK-LSTM-CNN模型由四个模块组成:新闻文本预处理、新闻文本聚类、情感语义分析以及BiK-LSTM-CNN模型本身。通过组合这些组件,该模型有效地识别输入数据中的共同特征,对相似的新闻文章进行聚类,并准确分析文本的情感语义。这种综合方法提高了视觉提取和热点挖掘的准确性和效率。实验结果表明,与Transformer、AdvLSTM和NewRNN等模型相比,BiK-LSTM-CNN的宏观准确率分别提高了0.50%、0.91%和1.34%。同样,宏观召回率分别提高了0.51%、1.24%和1.26%,而宏观F1分数分别提高了0.52%、1.23%和1.92%。此外,BiK-LSTM-CNN模型在时间效率方面有显著提高,进一步确立了其作为处理和分析大规模文本数据的更有效方法的潜力。