• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
A new approach and gold standard toward author disambiguation in MEDLINE.一种新的方法和金标准,用于 MEDLINE 中的作者去重。
J Am Med Inform Assoc. 2019 Oct 1;26(10):1037-1045. doi: 10.1093/jamia/ocz028.
2
Aggregating large-scale databases for PubMed author name disambiguation.为 PubMed 作者姓名消歧聚合大规模数据库。
J Am Med Inform Assoc. 2021 Aug 13;28(9):1919-1927. doi: 10.1093/jamia/ocab095.
3
The strength of co-authorship in gene name disambiguation.共同作者在基因名称消歧中的作用强度。
BMC Bioinformatics. 2008 Jan 29;9:69. doi: 10.1186/1471-2105-9-69.
4
Author Name Disambiguation for PubMed.PubMed的作者姓名消歧
J Assoc Inf Sci Technol. 2014 Apr;65(4):765-781. doi: 10.1002/asi.23063. Epub 2013 Nov 21.
5
Knowledge based word-concept model estimation and refinement for biomedical text mining.用于生物医学文本挖掘的基于知识的词概念模型估计与优化。
J Biomed Inform. 2015 Feb;53:300-7. doi: 10.1016/j.jbi.2014.11.015. Epub 2014 Dec 12.
6
Bridging the gap in author names: building an enhanced author name dataset for biomedical literature system.弥合作者姓名差异:构建生物医学文献系统的增强型作者姓名数据集。
J Am Med Inform Assoc. 2024 Aug 1;31(8):1648-1656. doi: 10.1093/jamia/ocae127.
7
Ethnicity-based name partitioning for author name disambiguation using supervised machine learning.使用监督式机器学习进行基于种族的姓名划分以消除作者姓名歧义
J Assoc Inf Sci Technol. 2021 Aug;72(8):979-994. doi: 10.1002/asi.24459. Epub 2021 Feb 23.
8
Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation.医学主题词表(MeSH)的两种相似性度量:助力生物医学文本挖掘与作者姓名消歧
J Biomed Discov Collab. 2016 Apr 6;7:e1. doi: 10.5210/disco.v7i0.6654.
9
deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.深度生物词汇语义消歧:生物医学文本数据的有效深度神经网络词汇语义消歧。
J Am Med Inform Assoc. 2019 May 1;26(5):438-446. doi: 10.1093/jamia/ocy189.
10
Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text.结合机器学习、众包和专家知识来检测文本中的化学诱导疾病。
Database (Oxford). 2016 Jun 15;2016. doi: 10.1093/database/baw094. Print 2016.

引用本文的文献

1
Bridging the gap in author names: building an enhanced author name dataset for biomedical literature system.弥合作者姓名差异:构建生物医学文献系统的增强型作者姓名数据集。
J Am Med Inform Assoc. 2024 Aug 1;31(8):1648-1656. doi: 10.1093/jamia/ocae127.
2
Notes on the data quality of bibliographic records from the MEDLINE database.关于 MEDLINE 数据库书目记录数据质量的说明。
Database (Oxford). 2023 Nov 4;2023. doi: 10.1093/database/baad070.
3
ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions.ReCiter:一种开源的、以身份为驱动的、针对学术机构进行优化的作者预测算法。
PLoS One. 2021 Apr 1;16(4):e0244641. doi: 10.1371/journal.pone.0244641. eCollection 2021.

本文引用的文献

1
Author Name Disambiguation for PubMed.PubMed的作者姓名消歧
J Assoc Inf Sci Technol. 2014 Apr;65(4):765-781. doi: 10.1002/asi.23063. Epub 2013 Nov 21.
2
Author Disambiguation in PubMed: Evidence on the Precision and Recall of Author-ity among NIH-Funded Scientists.PubMed 中的作者身份识别:国立卫生研究院资助科学家的权威性精确性与召回率证据
PLoS One. 2016 Jul 1;11(7):e0158731. doi: 10.1371/journal.pone.0158731. eCollection 2016.
3
Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text.结合机器学习、众包和专家知识来检测文本中的化学诱导疾病。
Database (Oxford). 2016 Jun 15;2016. doi: 10.1093/database/baw094. Print 2016.
4
Quantifying the complexity of medical research.量化医学研究的复杂性。
Bioinformatics. 2013 Nov 15;29(22):2918-24. doi: 10.1093/bioinformatics/btt505. Epub 2013 Aug 31.
5
Identifying medical terms in patient-authored text: a crowdsourcing-based approach.识别患者撰写文本中的医学术语:基于众包的方法。
J Am Med Inform Assoc. 2013 Nov-Dec;20(6):1120-7. doi: 10.1136/amiajnl-2012-001110. Epub 2013 May 5.
6
Understanding PubMed user search behavior through log analysis.通过日志分析了解PubMed用户的搜索行为。
Database (Oxford). 2009;2009:bap018. doi: 10.1093/database/bap018. Epub 2009 Nov 27.
7
Author Name Disambiguation in MEDLINE.医学在线数据库(MEDLINE)中的作者姓名消歧
ACM Trans Knowl Discov Data. 2009 Jul 1;3(3). doi: 10.1145/1552303.1552304.
8
Visualizing evolution and impact of biomedical fields.可视化生物医学领域的发展与影响。
J Biomed Inform. 2008 Dec;41(6):1050-2. doi: 10.1016/j.jbi.2008.05.002. Epub 2008 May 11.

一种新的方法和金标准,用于 MEDLINE 中的作者去重。

A new approach and gold standard toward author disambiguation in MEDLINE.

机构信息

Roche Pharmaceutical Research and Early Development, pRED Informatics, Roche Innovation Center, Basel, Switzerland.

Institute of Computational Linguistics, University of Zurich, Switzerland.

出版信息

J Am Med Inform Assoc. 2019 Oct 1;26(10):1037-1045. doi: 10.1093/jamia/ocz028.

DOI:10.1093/jamia/ocz028
PMID:30958542
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7647200/
Abstract

OBJECTIVE

Author-centric analyses of fast-growing biomedical reference databases are challenging due to author ambiguity. This problem has been mainly addressed through author disambiguation using supervised machine-learning algorithms. Such algorithms, however, require adequately designed gold standards that reflect the reference database properly. In this study we used MEDLINE to build the first unbiased gold standard in a reference database and improve over the existing state of the art in author disambiguation.

MATERIALS AND METHODS

Following a new corpus design method, publication pairs randomly picked from MEDLINE were evaluated by both crowdsourcing and expert curators. Because the latter showed higher accuracy than crowdsourcing, expert curators were tasked to create a full corpus. The corpus was then used to explore new features that could improve state-of-the-art author disambiguation algorithms that would not have been discoverable with previously existing gold standards.

RESULTS

We created a gold standard based on 1900 publication pairs that shows close similarity to MEDLINE in terms of chronological distribution and information completeness. A machine-learning algorithm that includes new features related to the ethnic origin of authors showed significant improvements over the current state of the art and demonstrates the necessity of realistic gold standards to further develop effective author disambiguation algorithms.

DISCUSSION AND CONCLUSION

An unbiased gold standard can give a more accurate picture of the status of author disambiguation research and help in the discovery of new features for machine learning. The principles and methods shown here can be applied to other reference databases beyond MEDLINE. The gold standard and code used for this study are available at the following repository: https://github.com/amorgani/AND/.

摘要

目的

由于作者身份不明确,对快速增长的生物医学参考数据库进行以作者为中心的分析具有挑战性。这个问题主要通过使用监督机器学习算法进行作者消歧来解决。然而,这些算法需要设计合理的黄金标准,以正确反映参考数据库。在这项研究中,我们使用 MEDLINE 构建了第一个参考数据库中的无偏黄金标准,并改进了现有的作者消歧技术。

材料和方法

采用一种新的语料库设计方法,从 MEDLINE 中随机抽取的出版物对通过众包和专家编辑进行评估。由于后者的准确性高于众包,因此专家编辑负责创建完整的语料库。然后,该语料库被用于探索新的特征,这些特征可以改进现有的作者消歧算法,而这些特征是以前的黄金标准所无法发现的。

结果

我们创建了一个基于 1900 对出版物的黄金标准,在时间分布和信息完整性方面与 MEDLINE 非常相似。一个包含与作者种族起源相关的新特征的机器学习算法在性能上明显优于现有技术,这表明需要真实的黄金标准来进一步开发有效的作者消歧算法。

讨论和结论

无偏黄金标准可以更准确地反映作者消歧研究的现状,并有助于发现机器学习的新特征。这里展示的原则和方法可以应用于除 MEDLINE 之外的其他参考数据库。本研究使用的黄金标准和代码可在以下存储库中获得:https://github.com/amorgani/AND/。