什么因素可以预测头痛研究的引用次数和转化影响力？一项机器学习分析。

What predicts citation counts and translational impact in headache research? A machine learning analysis.

机构信息

NorHead Norwegian Centre for Headache Research, Trondheim, Norway.

Department of Computer Science, NTNU Norwegian University of Science and Technology, Trondheim, Norway.

出版信息

Cephalalgia. 2024 May;44(5):3331024241251488. doi: 10.1177/03331024241251488.

DOI:10.1177/03331024241251488

PMID:38690640

Abstract

BACKGROUND

We aimed to develop the first machine learning models to predict citation counts and the translational impact, defined as inclusion in guidelines or policy documents, of headache research, and assess which factors are most predictive.

METHODS

Bibliometric data and the titles, abstracts, and keywords from 8600 publications in three headache-oriented journals from their inception to 31 December 2017 were used. A series of machine learning models were implemented to predict three classes of 5-year citation count intervals (0-5, 6-14 and, >14 citations); and the translational impact of a publication. Models were evaluated out-of-sample with area under the receiver operating characteristics curve (AUC).

RESULTS

The top performing gradient boosting model predicted correct citation count class with an out-of-sample AUC of 0.81. Bibliometric data such as page count, number of references, first and last author citation counts and h-index were among the most important predictors. Prediction of translational impact worked optimally when including both bibliometric data and information from the title, abstract and keywords, reaching an out-of-sample AUC of 0.71 for the top performing random forest model.

CONCLUSION

Citation counts are best predicted by bibliometric data, while models incorporating both bibliometric data and publication content identifies the translational impact of headache research.

摘要

背景

我们旨在开发首个机器学习模型，以预测头痛研究的引用次数和转化影响（定义为纳入指南或政策文件），并评估哪些因素最具预测性。

方法

使用从三个头痛相关期刊自成立至 2017 年 12 月 31 日的 8600 篇出版物的书目数据以及标题、摘要和关键词，实施了一系列机器学习模型，以预测三个 5 年引用次数区间（0-5、6-14 和>14 次）的类别；以及出版物的转化影响。使用接收者操作特征曲线（AUC）下的面积评估模型的样本外性能。