Kanakaris Nikos, Giarelis Nikolaos, Siachos Ilias, Karacapilidis Nikos
Industrial Management and Information Systems Lab, MEAD, University of Patras, 26504 Rio Patras, Greece.
Entropy (Basel). 2021 May 25;23(6):664. doi: 10.3390/e23060664.
We consider the prediction of future research collaborations as a link prediction problem applied on a scientific knowledge graph. To the best of our knowledge, this is the first work on the prediction of future research collaborations that combines structural and textual information of a scientific knowledge graph through a purposeful integration of graph algorithms and natural language processing techniques. Our work: (i) investigates whether the integration of unstructured textual data into a single knowledge graph affects the performance of a link prediction model, (ii) studies the effect of previously proposed graph kernels based approaches on the performance of an ML model, as far as the link prediction problem is concerned, and (iii) proposes a three-phase pipeline that enables the exploitation of structural and textual information, as well as of pre-trained word embeddings. We benchmark the proposed approach against classical link prediction algorithms using accuracy, recall, and precision as our performance metrics. Finally, we empirically test our approach through various feature combinations with respect to the link prediction problem. Our experimentations with the new COVID-19 Open Research Dataset demonstrate a significant improvement of the abovementioned performance metrics in the prediction of future research collaborations.
我们将未来研究合作的预测视为应用于科学知识图谱的链接预测问题。据我们所知,这是第一项通过有目的地整合图算法和自然语言处理技术,将科学知识图谱的结构和文本信息相结合来预测未来研究合作的工作。我们的工作:(i)研究将非结构化文本数据整合到单个知识图谱中是否会影响链接预测模型的性能,(ii)就链接预测问题而言,研究先前提出的基于图核的方法对机器学习模型性能的影响,以及(iii)提出一个三阶段流程,该流程能够利用结构和文本信息以及预训练的词嵌入。我们使用准确率、召回率和精确率作为性能指标,将所提出的方法与经典链接预测算法进行基准测试。最后,我们针对链接预测问题通过各种特征组合对我们的方法进行实证测试。我们对新的COVID-19开放研究数据集进行的实验表明,在预测未来研究合作方面,上述性能指标有显著提高。