Zhu Hou, Shuhuai Li
School of Information Management, Sun Yat-sen University, Guangzhou, China.
Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, NSW, Australia.
PLoS One. 2024 Dec 2;19(12):e0312945. doi: 10.1371/journal.pone.0312945. eCollection 2024.
With the continuous increase in the number of academic researchers, the volume of scientific papers is also increasing rapidly. The challenge of identifying papers with greater potential academic impact from this large pool has received increasing attention. The citation frequency of a paper is often used as an objective indicator to gauge the academic influence of the paper. The task of citation frequency prediction based on historical citation data in previous studies can achieve high accuracy. However, it can only be executed after the paper has been published for a period. The delay is not conducive to timely discovery of papers with high citation frequency. In this paper, we propose a novel method for predicting cited potential of a paper based on the metadata and semantic information, which can predict the cited potential of academic paper instantly once it has been published. Specifically, the semantic information, such as abstract, semantic span and semantic inflection, is extracted to enhance the ability of the prediction model based on machine learning. To prove the effectiveness and rationality of cited potential prediction model, we conduct two experiments to validate the model and find the most effective combination of input information. The empirical experiments show that the prediction accuracy of our proposed model can reach 88% for the instant prediction of citation.
随着学术研究人员数量的不断增加,科学论文的数量也在迅速增长。从这大量的论文中识别出具有更大潜在学术影响力的论文所面临的挑战受到了越来越多的关注。论文的被引频次常被用作衡量论文学术影响力的客观指标。基于以往研究中的历史被引数据进行被引频次预测的任务能够实现较高的准确率。然而,这只能在论文发表一段时间后才能进行。这种延迟不利于及时发现高被引频次的论文。在本文中,我们提出了一种基于元数据和语义信息预测论文被引潜力的新方法,该方法能够在学术论文发表后立即预测其被引潜力。具体而言,提取诸如摘要、语义跨度和语义拐点等语义信息,以增强基于机器学习的预测模型的能力。为了证明被引潜力预测模型的有效性和合理性,我们进行了两项实验来验证该模型并找出最有效的输入信息组合。实证实验表明,我们提出的模型对于即时被引预测的准确率可达88%。