Li Weijuan
Dean's Office, Yellow River Conservancy Technical Institute, Kaifeng, 475004, Henan, China.
Heliyon. 2024 Jul 15;10(14):e34685. doi: 10.1016/j.heliyon.2024.e34685. eCollection 2024 Jul 30.
Today, the number of published scientific articles is increasing day by day, and this has made the process of searching for articles more difficult. The need to provide specific recommender systems (RSs) for suggesting scientific articles is strongly felt in this situation. Because searching for articles based only on matching the titles or content of other articles is not an efficient process. In this research, the combination of two content analysis and citation network is used to design an RS for scientific articles (RECSA). In RECSA, natural language processing and deep learning techniques are used to process the titles and extract the content attributes of the articles. For this purpose, first, the titles of the articles are pre-processed, and by using the Term Frequency Inverse Document Frequency (TF-IDF) criterion, the importance of each word in the title is estimated. Then the dimensions of the obtained attributes are reduced by using a convolutional neural network (CNN). Then, by using the cosine similarity criterion, the content similarity matrix of the articles is calculated based on the attribute vectors. Also, the link prediction approach is used to analyze the connections of scientific articles' citation network. Finally, in the third step of RECSA, the two similarity matrices calculated in the previous steps are combined using an influence coefficient parameter to obtain the final similarity matrix, and the recommendation operation is based on the highest similarity value. The efficiency of RECSA has been evaluated from different aspects and the results have been compared with previous works. According to the results, utilizing the combination of TF-IDF and CNN for analyzing content-based features, leads to at least 0.32 % improvement in terms of precision compared to previous works. Also, by integrating citation and content-based data, the precision of first suggestion in RECSA would be 99.01 % which indicates the minimum improvement of 0.9 % compared to compared methods. The results show that by using RECSA, the recommendation can be done with higher accuracy and efficiency.
如今,已发表的科学文章数量与日俱增,这使得文章检索过程变得更加困难。在这种情况下,人们强烈感受到需要提供特定的推荐系统(RS)来推荐科学文章。因为仅基于文章标题或内容匹配来搜索文章并非高效的过程。在本研究中,将两种内容分析与引文网络相结合,用于设计科学文章推荐系统(RECSA)。在RECSA中,使用自然语言处理和深度学习技术来处理文章标题并提取文章的内容属性。为此,首先对文章标题进行预处理,并使用词频逆文档频率(TF-IDF)准则来估计标题中每个单词的重要性。然后使用卷积神经网络(CNN)来降低所得属性的维度。接着,使用余弦相似度准则,基于属性向量计算文章的内容相似度矩阵。此外,采用链接预测方法来分析科学文章引文网络的连接。最后,在RECSA的第三步中,使用影响系数参数将前两步计算得到的两个相似度矩阵进行组合,以获得最终的相似度矩阵,并基于最高相似度值进行推荐操作。从不同方面对RECSA的效率进行了评估,并将结果与先前的工作进行了比较。结果表明,与先前的工作相比,利用TF-IDF和CNN的组合来分析基于内容的特征,在精度方面至少提高了0.32%。此外,通过整合基于引文和内容的数据,RECSA中首次推荐的精度将达到99.01%,这表明与比较方法相比至少提高了0.9%。结果表明,使用RECSA可以实现更高准确率和效率的推荐。