• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

医学在线数据库(MEDLINE)中的作者姓名消歧

Author Name Disambiguation in MEDLINE.

作者信息

Torvik Vetle I, Smalheiser Neil R

机构信息

University of Illinois at Chicago.

出版信息

ACM Trans Knowl Discov Data. 2009 Jul 1;3(3). doi: 10.1145/1552303.1552304.

DOI:10.1145/1552303.1552304
PMID:20072710
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2805000/
Abstract

BACKGROUND

We recently described "Author-ity," a model for estimating the probability that two articles in MEDLINE, sharing the same author name, were written by the same individual. Features include shared title words, journal name, coauthors, medical subject headings, language, affiliations, and author name features (middle initial, suffix, and prevalence in MEDLINE). Here we test the hypothesis that the Author-ity model will suffice to disambiguate author names for the vast majority of articles in MEDLINE. METHODS: Enhancements include: (a) incorporating first names and their variants, email addresses, and correlations between specific last names and affiliation words; (b) new methods of generating large unbiased training sets; (c) new methods for estimating the prior probability; (d) a weighted least squares algorithm for correcting transitivity violations; and (e) a maximum likelihood based agglomerative algorithm for computing clusters of articles that represent inferred author-individuals. RESULTS: Pairwise comparisons were computed for all author names on all 15.3 million articles in MEDLINE (2006 baseline), that share last name and first initial, to create Author-ity 2006, a database that has each name on each article assigned to one of 6.7 million inferred author-individual clusters. Recall is estimated at ~98.8%. Lumping (putting two different individuals into the same cluster) affects ~0.5% of clusters, whereas splitting (assigning articles written by the same individual to >1 cluster) affects ~2% of articles. IMPACT: The Author-ity model can be applied generally to other bibliographic databases. Author name disambiguation allows information retrieval and data integration to become person-centered, not just document-centered, setting the stage for new data mining and social network tools that will facilitate the analysis of scholarly publishing and collaboration behavior. AVAILABILITY: The Author-ity 2006 database is available for nonprofit academic research, and can be freely queried via http://arrowsmith.psych.uic.edu.

摘要

背景

我们最近描述了“作者权威”模型,该模型用于估计医学文献数据库(MEDLINE)中两篇署名相同的文章由同一作者撰写的概率。其特征包括共享的标题词汇、期刊名称、共同作者、医学主题词、语言、所属机构以及作者姓名特征(中间名首字母、后缀和在MEDLINE中的出现频率)。在此,我们检验“作者权威”模型足以消除MEDLINE中绝大多数文章作者姓名歧义这一假设。方法:改进包括:(a)纳入名字及其变体、电子邮件地址,以及特定姓氏与所属机构词汇之间的相关性;(b)生成大型无偏训练集的新方法;(c)估计先验概率的新方法;(d)用于纠正传递性违规的加权最小二乘算法;(e)基于最大似然的凝聚算法,用于计算代表推断出的作者个体的文章聚类。结果:对MEDLINE(2006年基线)中所有1530万篇文章上共享姓氏和名字首字母的所有作者姓名进行成对比较,创建了“作者权威2006”数据库,该数据库将每篇文章上的每个姓名分配到670万个推断出的作者个体聚类中的一个。召回率估计约为98.8%。合并(将两个不同个体归入同一聚类)影响约0.5%的聚类,而拆分(将同一作者撰写的文章分配到多个聚类)影响约2%的文章。影响:“作者权威”模型可普遍应用于其他书目数据库。作者姓名消歧使信息检索和数据整合以人而非仅以文档为中心,为新的数据挖掘和社交网络工具奠定基础,这些工具将有助于分析学术出版和合作行为。可用性:“作者权威2006”数据库可用于非营利性学术研究,可通过http://arrowsmith.psych.uic.edu免费查询。

相似文献

1
Author Name Disambiguation in MEDLINE.医学在线数据库(MEDLINE)中的作者姓名消歧
ACM Trans Knowl Discov Data. 2009 Jul 1;3(3). doi: 10.1145/1552303.1552304.
2
A probabilistic similarity metric for Medline records: a model for author name disambiguation.一种用于Medline记录的概率相似性度量:作者姓名消歧模型。
AMIA Annu Symp Proc. 2003;2003:1033.
3
Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation.医学主题词表(MeSH)的两种相似性度量:助力生物医学文本挖掘与作者姓名消歧
J Biomed Discov Collab. 2016 Apr 6;7:e1. doi: 10.5210/disco.v7i0.6654.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
Author Name Disambiguation for PubMed.PubMed的作者姓名消歧
J Assoc Inf Sci Technol. 2014 Apr;65(4):765-781. doi: 10.1002/asi.23063. Epub 2013 Nov 21.
6
Three journal similarity metrics and their application to biomedical journals.三种期刊相似性指标及其在生物医学期刊中的应用。
PLoS One. 2014 Dec 23;9(12):e115681. doi: 10.1371/journal.pone.0115681. eCollection 2014.
7
Bridging the gap in author names: building an enhanced author name dataset for biomedical literature system.弥合作者姓名差异:构建生物医学文献系统的增强型作者姓名数据集。
J Am Med Inform Assoc. 2024 Aug 1;31(8):1648-1656. doi: 10.1093/jamia/ocae127.
8
Author Disambiguation in PubMed: Evidence on the Precision and Recall of Author-ity among NIH-Funded Scientists.PubMed 中的作者身份识别:国立卫生研究院资助科学家的权威性精确性与召回率证据
PLoS One. 2016 Jul 1;11(7):e0158731. doi: 10.1371/journal.pone.0158731. eCollection 2016.
9
A new approach and gold standard toward author disambiguation in MEDLINE.一种新的方法和金标准,用于 MEDLINE 中的作者去重。
J Am Med Inform Assoc. 2019 Oct 1;26(10):1037-1045. doi: 10.1093/jamia/ocz028.
10
Data sets for author name disambiguation: an empirical analysis and a new resource.用于消除作者姓名歧义的数据集:实证分析与新资源。
Scientometrics. 2017;111(3):1467-1500. doi: 10.1007/s11192-017-2363-5. Epub 2017 Mar 27.

引用本文的文献

1
An analysis of the effects of sharing research data, code, and preprints on citations.对分享研究数据、代码和预印本对引文影响的分析。
PLoS One. 2024 Oct 30;19(10):e0311493. doi: 10.1371/journal.pone.0311493. eCollection 2024.
2
Slow convergence: Career impediments to interdisciplinary biomedical research.缓慢的融合:跨学科生物医学研究的职业障碍。
Proc Natl Acad Sci U S A. 2024 Aug 6;121(32):e2402646121. doi: 10.1073/pnas.2402646121. Epub 2024 Jul 29.
3
Bridging the gap in author names: building an enhanced author name dataset for biomedical literature system.

本文引用的文献

1
Arrowsmith two-node search interface: a tutorial on finding meaningful links between two disparate sets of articles in MEDLINE.阿罗史密斯双节点搜索界面:关于在医学在线数据库(MEDLINE)中两组不同文章之间寻找有意义联系的教程。
Comput Methods Programs Biomed. 2009 May;94(2):190-7. doi: 10.1016/j.cmpb.2008.12.006. Epub 2009 Jan 30.
2
Anne O'Tate: A tool to support user-driven summarization, drill-down and browsing of PubMed search results.安妮·奥泰特:一种支持用户驱动的PubMed搜索结果摘要、深入挖掘和浏览的工具。
J Biomed Discov Collab. 2008 Feb 15;3:2. doi: 10.1186/1747-5333-3-2.
3
Scientific publishing: identity crisis.
弥合作者姓名差异:构建生物医学文献系统的增强型作者姓名数据集。
J Am Med Inform Assoc. 2024 Aug 1;31(8):1648-1656. doi: 10.1093/jamia/ocae127.
4
Development and Validation of an Automated Tool to Retrieve and Curate Faculty Publications of Academic Departments.用于检索和整理学术部门教师出版物的自动化工具的开发与验证
Cureus. 2023 Oct 30;15(10):e47976. doi: 10.7759/cureus.47976. eCollection 2023 Oct.
5
Publish or Perish: Selective Attrition as a Unifying Explanation for Patterns in Innovation over the Career.不发表就淘汰:选择性损耗作为职业生涯中创新模式的统一解释。
J Hum Resour. 2023 Jul;58(4):1307-1346. doi: 10.3368/jhr.59.2.1219-10630r1.
6
SciSciNet: A large-scale open data lake for the science of science research.SciSciNet:科学学研究的大规模开放数据湖。
Sci Data. 2023 Jun 1;10(1):315. doi: 10.1038/s41597-023-02198-9.
7
Science, interrupted: Funding delays reduce research activity but having more grants helps.科学受阻:资金延迟减少了研究活动,但拥有更多的资助有助于研究。
PLoS One. 2023 Apr 26;18(4):e0280576. doi: 10.1371/journal.pone.0280576. eCollection 2023.
8
Scientific rewards for biomedical specialization are large and persistent.生物医药专业化的科学回报是巨大且持久的。
BMC Biol. 2022 Sep 30;20(1):211. doi: 10.1186/s12915-022-01400-5.
9
The ripple effects of funding on researchers and output.资金对研究人员和研究成果的连锁反应。
Sci Adv. 2022 Apr 22;8(16):eabb7348. doi: 10.1126/sciadv.abb7348.
10
A web-based tool for automatically linking clinical trials to their publications.一个用于自动将临床试验与其出版物进行链接的网络工具。
J Am Med Inform Assoc. 2022 Apr 13;29(5):822-830. doi: 10.1093/jamia/ocab290.
科学出版:身份危机。
Nature. 2008 Feb 14;451(7180):766-7. doi: 10.1038/451766a.
4
A quantitative model for linking two disparate sets of articles in MEDLINE.一种用于链接MEDLINE中两组不同文章的定量模型。
Bioinformatics. 2007 Jul 1;23(13):1658-65. doi: 10.1093/bioinformatics/btm161. Epub 2007 Apr 26.
5
A day in the life of PubMed: analysis of a typical day's query log.《医学期刊数据库(PubMed)一天的使用情况:典型一天的查询日志分析》
J Am Med Inform Assoc. 2007 Mar-Apr;14(2):212-20. doi: 10.1197/jamia.M2191. Epub 2007 Jan 9.
6
A probabilistic similarity metric for Medline records: a model for author name disambiguation.一种用于Medline记录的概率相似性度量:作者姓名消歧模型。
AMIA Annu Symp Proc. 2003;2003:1033.
7
When A. Rose is not A. Rose: the vagaries of author searching.当A. 罗斯并非A. 罗斯时:作者搜索的变幻莫测
Med Ref Serv Q. 2003 Winter;22(4):1-11. doi: 10.1300/J115v22n04_01.
8
An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts.统计术语强度分析及其在分子生物学文本索引和检索中的应用。
Comput Biol Med. 1996 May;26(3):209-22. doi: 10.1016/0010-4825(95)00055-0.
9
Probabilistic linkage of large public health data files.大型公共卫生数据文件的概率性关联
Stat Med. 1995;14(5-7):491-8. doi: 10.1002/sim.4780140510.