Shahid Abdul, Afzal Muhammad Tanvir, Alharbi Abdullah, Aljuaid Hanan, Al-Otaibi Shaha
Institute of Computing, Kohat University of Science & Technology, Kohat, Pakistan.
Department of Computer Science, NAMAL Institute, Mianwali, Pakistan.
PeerJ Comput Sci. 2021 Jun 4;7:e524. doi: 10.7717/peerj-cs.524. eCollection 2021.
From the past half of a century, identification of the relevant documents is deemed an active area of research due to the rapid increase of data on the web. The traditional models to retrieve relevant documents are based on bibliographic information such as Bibliographic coupling, Co-citations, and Direct citations. However, in the recent past, the scientific community has started to employ textual features to improve existing models' accuracy. In our previous study, we found that analysis of citations at a deep level (i.e., content level) can play a paramount role in finding more relevant documents than surface level (i.e., just bibliography details). We found that cited and citing papers have a high degree of relevancy when in-text citations frequency of the cited paper is more than five times in the citing paper's text. This paper is an extension of our previous study in terms of its evaluation of a comprehensive dataset. Moreover, the study results are also compared with other state-of-the-art approaches i.e., content, metadata, and bibliography. For evaluation, a user study is conducted on selected papers from 1,200 documents (comprise about 16,000 references) of an online journal, Journal of Computer Science (J.UCS). The evaluation results indicate that in-text citation frequency has attained higher precision in finding relevant papers than other state-of-the-art techniques such as content, bibliographic coupling, and metadata-based techniques. The use of in-text citation may help in enhancing the quality of existing information systems and digital libraries. Further, more sophisticated measure may be redefined be considering the use of in-text citations.
在过去的半个世纪里,由于网络数据的迅速增长,相关文献的识别被视为一个活跃的研究领域。传统的检索相关文献的模型是基于诸如文献耦合、共被引和直接引用等书目信息。然而,最近科学界开始采用文本特征来提高现有模型的准确性。在我们之前的研究中,我们发现深入分析引用(即内容层面)在寻找比表面层面(即仅仅是书目细节)更相关的文献方面可以发挥至关重要的作用。我们发现,当被引论文在引用论文文本中的文内引用频率超过五次时,被引论文和引用论文具有高度相关性。本文是我们之前研究的扩展,对一个综合数据集进行了评估。此外,研究结果还与其他最先进的方法进行了比较,即内容、元数据和书目。为了进行评估,我们对在线期刊《计算机科学杂志》(J.UCS)的1200篇文档(包含约16000条参考文献)中挑选的论文进行了用户研究。评估结果表明,与其他最先进的技术(如内容、文献耦合和基于元数据的技术)相比,文内引用频率在查找相关论文方面具有更高的精度。文内引用的使用可能有助于提高现有信息系统和数字图书馆的质量。此外,考虑到文内引用的使用,可能会重新定义更复杂的度量标准。