为机构研究网络系统自动生成研究者文献目录。

Automatic generation of investigator bibliographies for institutional research networking systems.

作者信息

Johnson Stephen B, Bales Michael E, Dine Daniel, Bakken Suzanne, Albert Paul J, Weng Chunhua

机构信息

Department of Public Health, Weill Cornell Medical College, New York, United States.

Department of Biomedical Informatics, Columbia University, New York, United States.

出版信息

J Biomed Inform. 2014 Oct;51:8-14. doi: 10.1016/j.jbi.2014.03.013. Epub 2014 Mar 30.

DOI:10.1016/j.jbi.2014.03.013

PMID:24694772

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4180817/

Abstract

OBJECTIVE

Publications are a key data source for investigator profiles and research networking systems. We developed ReCiter, an algorithm that automatically extracts bibliographies from PubMed using institutional information about the target investigators.

METHODS

ReCiter executes a broad query against PubMed, groups the results into clusters that appear to constitute distinct author identities and selects the cluster that best matches the target investigator. Using information about investigators from one of our institutions, we compared ReCiter results to queries based on author name and institution and to citations extracted manually from the Scopus database. Five judges created a gold standard using citations of a random sample of 200 investigators.

RESULTS

About half of the 10,471 potential investigators had no matching citations in PubMed, and about 45% had fewer than 70 citations. Interrater agreement (Fleiss' kappa) for the gold standard was 0.81. Scopus achieved the best recall (sensitivity) of 0.81, while name-based queries had 0.78 and ReCiter had 0.69. ReCiter attained the best precision (positive predictive value) of 0.93 while Scopus had 0.85 and name-based queries had 0.31.

DISCUSSION

ReCiter accesses the most current citation data, uses limited computational resources and minimizes manual entry by investigators. Generation of bibliographies using named-based queries will not yield high accuracy. Proprietary databases can perform well but requite manual effort. Automated generation with higher recall is possible but requires additional knowledge about investigators.

摘要

目的

出版物是研究人员简介和研究网络系统的关键数据源。我们开发了ReCiter，一种利用目标研究人员的机构信息从PubMed自动提取参考文献的算法。

方法

ReCiter对PubMed执行广泛查询，将结果分组为似乎构成不同作者身份的集群，并选择与目标研究人员最匹配的集群。利用我们其中一个机构研究人员的信息，我们将ReCiter的结果与基于作者姓名和机构的查询结果以及从Scopus数据库手动提取的引文进行了比较。五名评判员使用200名研究人员的随机样本引文创建了一个黄金标准。

结果

在10471名潜在研究人员中，约一半在PubMed中没有匹配的引文，约45%的人引文少于70条。黄金标准的评判员间一致性（Fleiss卡帕系数）为0.81。Scopus的召回率（敏感性）最高，为0.81，而基于姓名的查询为0.78，ReCiter为0.69。ReCiter的精确率（阳性预测值）最高，为0.93，而Scopus为0.85，基于姓名的查询为0.31。

讨论

ReCiter可获取最新的引文数据，使用有限的计算资源，并最大限度减少研究人员的手动录入。使用基于姓名的查询生成参考文献不会产生高准确性。专有数据库表现良好，但需要人工操作。实现更高召回率的自动生成是可能的，但需要有关研究人员的额外知识。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

为机构研究网络系统自动生成研究者文献目录。

Automatic generation of investigator bibliographies for institutional research networking systems.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

DISCUSSION

目的

方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献

为机构研究网络系统自动生成研究者文献目录。

Automatic generation of investigator bibliographies for institutional research networking systems.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

DISCUSSION

目的

方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献