Department of Computer Science, National Textile University, Faisalabad, Pakistan.
Punjab University College of Information Technology (PUCIT), University of the Punjab (PU), Lahore, Pakistan.
PLoS One. 2020 Mar 5;15(3):e0228885. doi: 10.1371/journal.pone.0228885. eCollection 2020.
A citation is deemed as a potential parameter to determine linkage between research articles. The parameter has extensively been employed to form multifarious academic aspects like calculating the impact factor of journals, h-Index of researchers, allocate different research grants, find the latest research trends, etc. The current state-of-the-art contends that all citations are not of equal importance. Based on this argument, the current trend in citation classification community categorizes citations into important and non-important reasons. The community has proposed different approaches to extract important citations such as citation count, context-based, metadata, and textual based approaches. The contemporary state-of-the-art in citation classification community ignores significantly potential features that can play a vital role in citation classification. This research presents a novel approach for binary citation classification by exploiting section-wise in-text citation frequencies, similarity score, and overall citation count-based features. The study also introduces machine learning algorithms based novel approach for assigning appropriate weights to the logical sections of research papers. The weights are allocated to the citations with respect to their sections. To perform the classification, we used three classification techniques, Support Vector Machine, Kernel Linear Regression, and Random Forest. The experiment was performed on two annotated benchmark datasets that contain 465 and 311 citation pairs of research articles respectively. The results revealed that the proposed approach attained an improved value of precision (i.e., 0.84 vs 0.72) from contemporary state-of-the-art approach.
引文被认为是确定研究文章之间联系的潜在参数。该参数广泛用于形成各种学术方面,例如计算期刊的影响因子、研究人员的 h 指数、分配不同的研究资金、发现最新的研究趋势等。目前的观点认为,并非所有引文都具有同等重要性。基于这一论点,引文分类社区目前的趋势将引文分为重要和非重要原因。该社区已经提出了不同的方法来提取重要引文,例如引文计数、基于上下文、元数据和基于文本的方法。引文分类社区目前的最新技术忽略了可能在引文分类中发挥重要作用的潜在特征。本研究通过利用文本内按节引用频率、相似度得分和整体引用计数的特征,提出了一种新的二进制引文分类方法。该研究还介绍了基于机器学习算法的新方法,为研究论文的逻辑部分分配适当的权重。根据其所属的部分为引文分配权重。为了进行分类,我们使用了三种分类技术,支持向量机、核线性回归和随机森林。实验在两个分别包含 465 和 311 对研究文章引文的标注基准数据集上进行。结果表明,与当前最先进的方法相比,所提出的方法在精度方面的提高了 0.84 个百分点(即 0.72 与 0.84)。