Achakulvisut Titipat, Acuna Daniel E, Ruangrong Tulakan, Kording Konrad
Department of Biomedical Engineering, Northwestern University, Chicago, Illinois, United States of America.
School of Information Studies, Syracuse University, Syracuse, New York, United States of America.
PLoS One. 2016 Jul 6;11(7):e0158423. doi: 10.1371/journal.pone.0158423. eCollection 2016.
Finding relevant publications is important for scientists who have to cope with exponentially increasing numbers of scholarly material. Algorithms can help with this task as they help for music, movie, and product recommendations. However, we know little about the performance of these algorithms with scholarly material. Here, we develop an algorithm, and an accompanying Python library, that implements a recommendation system based on the content of articles. Design principles are to adapt to new content, provide near-real time suggestions, and be open source. We tested the library on 15K posters from the Society of Neuroscience Conference 2015. Human curated topics are used to cross validate parameters in the algorithm and produce a similarity metric that maximally correlates with human judgments. We show that our algorithm significantly outperformed suggestions based on keywords. The work presented here promises to make the exploration of scholarly material faster and more accurate.
对于那些必须应对数量呈指数级增长的学术资料的科学家来说,找到相关出版物至关重要。算法可以帮助完成这项任务,就像它们在音乐、电影和产品推荐中发挥的作用一样。然而,我们对这些算法在学术资料方面的性能了解甚少。在此,我们开发了一种算法以及一个配套的Python库,该库实现了一个基于文章内容的推荐系统。设计原则是适应新内容、提供近实时建议并且开源。我们在2015年神经科学学会会议的15000张海报上测试了该库。人工策划的主题用于交叉验证算法中的参数,并生成与人类判断最大程度相关的相似性度量。我们表明,我们的算法显著优于基于关键词的建议。本文所展示的工作有望使学术资料的探索更快、更准确。