Suppr超能文献

引用数据存储库:蛋白质数据库案例研究

Citing a Data Repository: A Case Study of the Protein Data Bank.

作者信息

Huang Yi-Hung, Rose Peter W, Hsu Chun-Nan

机构信息

Department of Computer Science, National Taiwan University, Taipei 106, Taiwan; Intel-NTU Connected Context Computing Center, National Taiwan University, Taipei 106, Taiwan.

RCSB Protein Data Bank, San Diego Supercomputer Center, UC San Diego, La Jolla, CA 92093, United States of America.

出版信息

PLoS One. 2015 Aug 28;10(8):e0136631. doi: 10.1371/journal.pone.0136631. eCollection 2015.

Abstract

The Protein Data Bank (PDB) is the worldwide repository of 3D structures of proteins, nucleic acids and complex assemblies. The PDB's large corpus of data (> 100,000 structures) and related citations provide a well-organized and extensive test set for developing and understanding data citation and access metrics. In this paper, we present a systematic investigation of how authors cite PDB as a data repository. We describe a novel metric based on information cascade constructed by exploring the citation network to measure influence between competing works and apply that to analyze different data citation practices to PDB. Based on this new metric, we found that the original publication of RCSB PDB in the year 2000 continues to attract most citations though many follow-up updates were published. None of these follow-up publications by members of the wwPDB organization can compete with the original publication in terms of citations and influence. Meanwhile, authors increasingly choose to use URLs of PDB in the text instead of citing PDB papers, leading to disruption of the growth of the literature citations. A comparison of data usage statistics and paper citations shows that PDB Web access is highly correlated with URL mentions in the text. The results reveal the trend of how authors cite a biomedical data repository and may provide useful insight of how to measure the impact of a data repository.

摘要

蛋白质数据库(PDB)是蛋白质、核酸及复杂组装体三维结构的全球储存库。PDB庞大的数据语料库(超过10万个结构)及相关引用文献,为开发和理解数据引用及访问指标提供了一个组织良好且广泛的测试集。在本文中,我们对作者如何将PDB作为数据储存库进行引用展开了系统研究。我们描述了一种基于信息级联的新指标,该指标通过探索引用网络构建而成,用于衡量相互竞争的作品之间的影响力,并将其应用于分析针对PDB的不同数据引用实践。基于这一新指标,我们发现,尽管发布了许多后续更新版本,但2000年RCSB PDB的原始出版物仍然吸引了最多的引用。wwPDB组织成员发布的这些后续出版物在引用次数和影响力方面均无法与原始出版物相竞争。与此同时,作者越来越倾向于在文本中使用PDB的网址而非引用PDB论文,这导致文献引用量的增长受到干扰。数据使用统计与论文引用的比较表明,PDB网络访问量与文本中网址提及次数高度相关。研究结果揭示了作者引用生物医学数据储存库的趋势,并可能为如何衡量数据储存库的影响力提供有益见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/59c7/4552849/5a760851e360/pone.0136631.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验