Suppr超能文献

ReCiter:一种开源的、以身份为驱动的、针对学术机构进行优化的作者预测算法。

ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions.

机构信息

Samuel J. Wood Library and Information Technologies & Services, Weill Cornell Medicine, New York, New York, United States of America.

Information Technologies & Services, Weill Cornell Medicine, New York, New York, United States of America.

出版信息

PLoS One. 2021 Apr 1;16(4):e0244641. doi: 10.1371/journal.pone.0244641. eCollection 2021.

Abstract

Academic institutions need to maintain publication lists for thousands of faculty and other scholars. Automated tools are essential to minimize the need for direct feedback from the scholars themselves who are practically unable to commit necessary effort to keep the data accurate. In relying exclusively on clustering techniques, author disambiguation applications fail to satisfy key use cases of academic institutions. Algorithms can perfectly group together a set of publications authored by a common individual, but, for them to be useful to an academic institution, they need to programmatically and recurrently map articles to thousands of scholars of interest en masse. Consistent with a savvy librarian's approach for generating a scholar's list of publications, identity-driven authorship prediction is the process of using information about a scholar to quantify the likelihood that person wrote certain articles. ReCiter is an application that attempts to do exactly that. ReCiter uses institutionally-maintained identity data such as name of department and year of terminal degree to predict which articles a given scholar has authored. To compute the overall score for a given candidate article from PubMed (and, optionally, Scopus), ReCiter uses: up to 12 types of commonly available, identity data; whether other members of a cluster have been accepted or rejected by a user; and the average score of a cluster. In addition, ReCiter provides scoring and qualitative evidence supporting why particular articles are suggested. This context and confidence scoring allows curators to more accurately provide feedback on behalf of scholars. To help users to more efficiently curate publication lists, we used a support vector machine analysis to optimize the scoring of the ReCiter algorithm. In our analysis of a diverse test group of 500 scholars at an academic private medical center, ReCiter correctly predicted 98% of their publications in PubMed.

摘要

学术机构需要维护成千上万的教职员工和其他学者的出版物清单。自动化工具对于减少直接从实际上无法投入必要精力来确保数据准确性的学者那里获取反馈的需求至关重要。仅依靠聚类技术,作者去重应用程序无法满足学术机构的关键用例。算法可以完美地将一组由共同作者撰写的出版物组合在一起,但为了对学术机构有用,它们需要以编程方式和定期地将文章大规模地映射到数千名感兴趣的学者。与精明的图书馆员生成学者出版物列表的方法一致,基于身份的作者预测是使用有关学者的信息来量化该人撰写某些文章的可能性的过程。ReCiter 是一个尝试做到这一点的应用程序。ReCiter 使用机构维护的身份数据(如部门名称和最高学位授予年份)来预测给定学者撰写的哪些文章。为了从 PubMed(和可选的 Scopus)计算给定候选文章的总体得分,ReCiter 使用:多达 12 种常用的身份数据类型;集群中的其他成员是否被用户接受或拒绝;以及集群的平均得分。此外,ReCiter 提供了支持特定文章被建议的原因的评分和定性证据。这种上下文和置信度评分允许策展人更准确地代表学者提供反馈。为了帮助用户更有效地管理出版物清单,我们使用支持向量机分析来优化 ReCiter 算法的评分。在对一个学术私立医疗中心的 500 名学者的多样化测试组进行的分析中,ReCiter 在 PubMed 中正确预测了 98%的他们的出版物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2430/8016248/40cfb62ddf9a/pone.0244641.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验