NorHead Norwegian Centre for Headache Research, Trondheim, Norway.
Department of Computer Science, NTNU Norwegian University of Science and Technology, Trondheim, Norway.
Cephalalgia. 2024 May;44(5):3331024241251488. doi: 10.1177/03331024241251488.
We aimed to develop the first machine learning models to predict citation counts and the translational impact, defined as inclusion in guidelines or policy documents, of headache research, and assess which factors are most predictive.
Bibliometric data and the titles, abstracts, and keywords from 8600 publications in three headache-oriented journals from their inception to 31 December 2017 were used. A series of machine learning models were implemented to predict three classes of 5-year citation count intervals (0-5, 6-14 and, >14 citations); and the translational impact of a publication. Models were evaluated out-of-sample with area under the receiver operating characteristics curve (AUC).
The top performing gradient boosting model predicted correct citation count class with an out-of-sample AUC of 0.81. Bibliometric data such as page count, number of references, first and last author citation counts and h-index were among the most important predictors. Prediction of translational impact worked optimally when including both bibliometric data and information from the title, abstract and keywords, reaching an out-of-sample AUC of 0.71 for the top performing random forest model.
Citation counts are best predicted by bibliometric data, while models incorporating both bibliometric data and publication content identifies the translational impact of headache research.
我们旨在开发首个机器学习模型,以预测头痛研究的引用次数和转化影响(定义为纳入指南或政策文件),并评估哪些因素最具预测性。
使用从三个头痛相关期刊自成立至 2017 年 12 月 31 日的 8600 篇出版物的书目数据以及标题、摘要和关键词,实施了一系列机器学习模型,以预测三个 5 年引用次数区间(0-5、6-14 和>14 次)的类别;以及出版物的转化影响。使用接收者操作特征曲线(AUC)下的面积评估模型的样本外性能。
表现最佳的梯度提升模型预测正确的引用次数类别,样本外 AUC 为 0.81。页码、参考文献数量、第一作者和最后作者引用次数和 h 指数等计量数据是最重要的预测因素之一。当同时包含计量数据和标题、摘要和关键词信息时,预测转化影响的效果最佳,表现最佳的随机森林模型的样本外 AUC 为 0.71。
引用次数最好由计量数据预测,而同时包含计量数据和出版物内容的模型则可以确定头痛研究的转化影响。