• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

为机构研究网络系统自动生成研究者文献目录。

Automatic generation of investigator bibliographies for institutional research networking systems.

作者信息

Johnson Stephen B, Bales Michael E, Dine Daniel, Bakken Suzanne, Albert Paul J, Weng Chunhua

机构信息

Department of Public Health, Weill Cornell Medical College, New York, United States.

Department of Biomedical Informatics, Columbia University, New York, United States.

出版信息

J Biomed Inform. 2014 Oct;51:8-14. doi: 10.1016/j.jbi.2014.03.013. Epub 2014 Mar 30.

DOI:10.1016/j.jbi.2014.03.013
PMID:24694772
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4180817/
Abstract

OBJECTIVE

Publications are a key data source for investigator profiles and research networking systems. We developed ReCiter, an algorithm that automatically extracts bibliographies from PubMed using institutional information about the target investigators.

METHODS

ReCiter executes a broad query against PubMed, groups the results into clusters that appear to constitute distinct author identities and selects the cluster that best matches the target investigator. Using information about investigators from one of our institutions, we compared ReCiter results to queries based on author name and institution and to citations extracted manually from the Scopus database. Five judges created a gold standard using citations of a random sample of 200 investigators.

RESULTS

About half of the 10,471 potential investigators had no matching citations in PubMed, and about 45% had fewer than 70 citations. Interrater agreement (Fleiss' kappa) for the gold standard was 0.81. Scopus achieved the best recall (sensitivity) of 0.81, while name-based queries had 0.78 and ReCiter had 0.69. ReCiter attained the best precision (positive predictive value) of 0.93 while Scopus had 0.85 and name-based queries had 0.31.

DISCUSSION

ReCiter accesses the most current citation data, uses limited computational resources and minimizes manual entry by investigators. Generation of bibliographies using named-based queries will not yield high accuracy. Proprietary databases can perform well but requite manual effort. Automated generation with higher recall is possible but requires additional knowledge about investigators.

摘要

目的

出版物是研究人员简介和研究网络系统的关键数据源。我们开发了ReCiter,一种利用目标研究人员的机构信息从PubMed自动提取参考文献的算法。

方法

ReCiter对PubMed执行广泛查询,将结果分组为似乎构成不同作者身份的集群,并选择与目标研究人员最匹配的集群。利用我们其中一个机构研究人员的信息,我们将ReCiter的结果与基于作者姓名和机构的查询结果以及从Scopus数据库手动提取的引文进行了比较。五名评判员使用200名研究人员的随机样本引文创建了一个黄金标准。

结果

在10471名潜在研究人员中,约一半在PubMed中没有匹配的引文,约45%的人引文少于70条。黄金标准的评判员间一致性(Fleiss卡帕系数)为0.81。Scopus的召回率(敏感性)最高,为0.81,而基于姓名的查询为0.78,ReCiter为0.69。ReCiter的精确率(阳性预测值)最高,为0.93,而Scopus为0.85,基于姓名的查询为0.31。

讨论

ReCiter可获取最新的引文数据,使用有限的计算资源,并最大限度减少研究人员的手动录入。使用基于姓名的查询生成参考文献不会产生高准确性。专有数据库表现良好,但需要人工操作。实现更高召回率的自动生成是可能的,但需要有关研究人员的额外知识。

相似文献

1
Automatic generation of investigator bibliographies for institutional research networking systems.为机构研究网络系统自动生成研究者文献目录。
J Biomed Inform. 2014 Oct;51:8-14. doi: 10.1016/j.jbi.2014.03.013. Epub 2014 Mar 30.
2
ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions.ReCiter:一种开源的、以身份为驱动的、针对学术机构进行优化的作者预测算法。
PLoS One. 2021 Apr 1;16(4):e0244641. doi: 10.1371/journal.pone.0244641. eCollection 2021.
3
A novel feature selection strategy for enhanced biomedical event extraction using the Turku system.一种使用图尔库系统增强生物医学事件提取的新型特征选择策略。
Biomed Res Int. 2014;2014:205239. doi: 10.1155/2014/205239. Epub 2014 Apr 6.
4
Investigation into biomedical literature classification using support vector machines.使用支持向量机对生物医学文献分类的研究。
Proc IEEE Comput Syst Bioinform Conf. 2005:366-74. doi: 10.1109/csb.2005.36.
5
Evaluating word representation features in biomedical named entity recognition tasks.评估生物医学命名实体识别任务中的词表示特征。
Biomed Res Int. 2014;2014:240403. doi: 10.1155/2014/240403. Epub 2014 Mar 6.
6
A survey of current work in biomedical text mining.生物医学文本挖掘的当前工作调查。
Brief Bioinform. 2005 Mar;6(1):57-71. doi: 10.1093/bib/6.1.57.
7
An entity tagger for recognizing acquired genomic variations in cancer literature.一种用于识别癌症文献中获得性基因组变异的实体标记器。
Bioinformatics. 2004 Nov 22;20(17):3249-51. doi: 10.1093/bioinformatics/bth350. Epub 2004 Jun 4.
8
Use of controlled vocabularies to improve biomedical information retrieval tasks.使用受控词汇表来改进生物医学信息检索任务。
Stud Health Technol Inform. 2013;192:1068.
9
NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库:一种用于疾病名称识别和概念规范化的资源。
J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.
10
Automatic information extraction for computerized clinical guideline.用于计算机化临床指南的自动信息提取
Stud Health Technol Inform. 2013;192:1023.

引用本文的文献

1
Scalable Scientific Interest Profiling Using Large Language Models.使用大语言模型进行可扩展的科学兴趣剖析
ArXiv. 2025 Aug 19:arXiv:2508.15834v1.
2
The role of information science within the clinical translational science ecosystem.信息科学在临床转化科学生态系统中的作用。
J Clin Transl Sci. 2024 Nov 27;8(1):e224. doi: 10.1017/cts.2024.664. eCollection 2025.
3
Bridging the gap in author names: building an enhanced author name dataset for biomedical literature system.弥合作者姓名差异:构建生物医学文献系统的增强型作者姓名数据集。
J Am Med Inform Assoc. 2024 Aug 1;31(8):1648-1656. doi: 10.1093/jamia/ocae127.
4
ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions.ReCiter:一种开源的、以身份为驱动的、针对学术机构进行优化的作者预测算法。
PLoS One. 2021 Apr 1;16(4):e0244641. doi: 10.1371/journal.pone.0244641. eCollection 2021.
5
Dynamically generating T32 training documents using structured data.使用结构化数据动态生成 T32 培训文件。
J Med Libr Assoc. 2019 Jul;107(3):420-424. doi: 10.5195/jmla.2019.401. Epub 2019 Jul 1.
6
Research evaluation support services in biomedical libraries.生物医学图书馆中的研究评估支持服务。
J Med Libr Assoc. 2018 Jan;106(1):1-14. doi: 10.5195/jmla.2018.205. Epub 2018 Jan 2.
7
Researcher and Author Profiles: Opportunities, Advantages, and Limitations.研究者与作者简介:机遇、优势与局限
J Korean Med Sci. 2017 Nov;32(11):1749-1756. doi: 10.3346/jkms.2017.32.11.1749.
8
Associating co-authorship patterns with publications in high-impact journals.将共同作者模式与高影响力期刊上的出版物相关联。
J Biomed Inform. 2014 Dec;52:311-8. doi: 10.1016/j.jbi.2014.07.015. Epub 2014 Jul 19.

本文引用的文献

1
Associating co-authorship patterns with publications in high-impact journals.将共同作者模式与高影响力期刊上的出版物相关联。
J Biomed Inform. 2014 Dec;52:311-8. doi: 10.1016/j.jbi.2014.07.015. Epub 2014 Jul 19.
2
Matching identifiers in electronic health records: implications for duplicate records and patient safety.电子健康记录中的标识符匹配:对重复记录和患者安全的影响。
BMJ Qual Saf. 2013 Mar;22(3):219-24. doi: 10.1136/bmjqs-2012-001419. Epub 2013 Jan 29.
3
Evolution of coauthorship in public health services and systems research.公共卫生服务与体系研究合著的演变。
Am J Prev Med. 2011 Jul;41(1):112-7. doi: 10.1016/j.amepre.2011.03.018.
4
Using global unique identifiers to link autism collections.使用全球唯一标识符来链接自闭症数据集。
J Am Med Inform Assoc. 2010 Nov-Dec;17(6):689-95. doi: 10.1136/jamia.2009.002063.
5
Good news on the horizon: the Open Researcher and Contributor ID (ORCID).即将迎来的好消息:开放研究者与贡献者身份识别码(ORCID)。
DNA Repair (Amst). 2010 Feb 4;9(2):102. doi: 10.1016/j.dnarep.2009.12.005. Epub 2010 Jan 18.
6
Author Name Disambiguation in MEDLINE.医学在线数据库(MEDLINE)中的作者姓名消歧
ACM Trans Knowl Discov Data. 2009 Jul 1;3(3). doi: 10.1145/1552303.1552304.
7
Credit where credit is due.该归功于谁就是谁的功劳。
Nature. 2009 Dec 17;462(7275):825. doi: 10.1038/462825a.
8
An empiric modification to the probabilistic record linkage algorithm using frequency-based weight scaling.基于频率的权重缩放的概率记录链接算法的经验修正。
J Am Med Inform Assoc. 2009 Sep-Oct;16(5):738-45. doi: 10.1197/jamia.M3186. Epub 2009 Jun 30.
9
A probabilistic similarity metric for Medline records: a model for author name disambiguation.一种用于Medline记录的概率相似性度量:作者姓名消歧模型。
AMIA Annu Symp Proc. 2003;2003:1033.
10
Use of a MeSH-based index of faculty research interests to identify faculty publications: an IAIMSian study of precision, recall, and data reusability.使用基于医学主题词表的教员研究兴趣索引来识别教员出版物:一项关于准确性、召回率和数据可重用性的综合学术信息管理系统研究
Proc AMIA Symp. 2002:514-8.