• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

测量数据衰减:对单所大学共享数据持续可用性的分析。

Measuring data rot: An analysis of the continued availability of shared data from a Single University.

机构信息

Caltech Library, California Institute of Technology, Pasadena, CA, United States of America.

出版信息

PLoS One. 2024 Jun 5;19(6):e0304781. doi: 10.1371/journal.pone.0304781. eCollection 2024.

DOI:10.1371/journal.pone.0304781
PMID:38838010
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11152257/
Abstract

To determine where data is shared and what data is no longer available, this study analyzed data shared by researchers at a single university. 2166 supplemental data links were harvested from the university's institutional repository and web scraped using R. All links that failed to scrape or could not be tested algorithmically were tested for availability by hand. Trends in data availability by link type, age of publication, and data source were examined for patterns. Results show that researchers shared data in hundreds of places. About two-thirds of links to shared data were in the form of URLs and one-third were DOIs, with several FTP links and links directly to files. A surprising 13.4% of shared URL links pointed to a website homepage rather than a specific record on a website. After testing, 5.4% the 2166 supplemental data links were found to be no longer available. DOIs were the type of shared link that was least likely to disappear with a 1.7% loss, with URL loss at 5.9% averaged over time. Links from older publications were more likely to be unavailable, with a data disappearance rate estimated at 2.6% per year, as well as links to data hosted on journal websites. The results support best practice guidance to share data in a data repository using a permanent identifier.

摘要

为了确定数据共享的位置以及哪些数据不再可用,本研究分析了单所大学研究人员共享的数据。从大学的机构知识库中提取了 2166 个补充数据链接,并使用 R 进行了网络抓取。所有无法抓取或无法通过算法测试的链接都通过人工测试可用性。按链接类型、出版年龄和数据源检查数据可用性趋势,以寻找模式。结果表明,研究人员在数百个地方共享数据。大约三分之二的共享数据链接以 URL 的形式存在,三分之一是 DOI,还有一些 FTP 链接和直接指向文件的链接。令人惊讶的是,13.4%的共享 URL 链接指向网站主页,而不是网站上的特定记录。经过测试,2166 个补充数据链接中有 5.4%不再可用。DOI 是共享链接中最不容易消失的类型,损失率为 1.7%,平均每年 URL 损失率为 5.9%。来自较旧出版物的链接更有可能无法使用,数据消失率估计为每年 2.6%,以及指向期刊网站托管数据的链接。结果支持使用永久标识符在数据存储库中共享数据的最佳实践指南。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f857/11152257/8160a9d57ed9/pone.0304781.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f857/11152257/967f4fbaaa9c/pone.0304781.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f857/11152257/8160a9d57ed9/pone.0304781.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f857/11152257/967f4fbaaa9c/pone.0304781.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f857/11152257/8160a9d57ed9/pone.0304781.g002.jpg

相似文献

1
Measuring data rot: An analysis of the continued availability of shared data from a Single University.测量数据衰减:对单所大学共享数据持续可用性的分析。
PLoS One. 2024 Jun 5;19(6):e0304781. doi: 10.1371/journal.pone.0304781. eCollection 2024.
2
Long-term availability of data associated with articles in PLOS ONE.PLOS ONE 文章相关数据的长期可用性。
PLoS One. 2022 Aug 24;17(8):e0272845. doi: 10.1371/journal.pone.0272845. eCollection 2022.
3
How do astronomers share data? Reliability and persistence of datasets linked in AAS publications and a qualitative study of data practices among US astronomers.天文学家如何共享数据?美国天文学会出版物中数据集的可靠性和持久性以及对美国天文学家数据实践的定性研究。
PLoS One. 2014 Aug 28;9(8):e104798. doi: 10.1371/journal.pone.0104798. eCollection 2014.
4
Disappearing act: decay of uniform resource locators in health care management journals.消失的行为:医疗保健管理期刊中统一资源定位符的衰退
J Med Libr Assoc. 2009 Apr;97(2):122-30. doi: 10.3163/1536-5050.97.2.009.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
404 not found: the stability and persistence of URLs published in MEDLINE.404未找到:发表在MEDLINE上的网址的稳定性和持久性。
Bioinformatics. 2004 Mar 22;20(5):668-72. doi: 10.1093/bioinformatics/btg465. Epub 2004 Jan 22.
7
The continued problem of URL decay: an updated analysis of health care management journal citations.URL 衰退问题持续存在:对医疗保健管理期刊引文的更新分析。
J Med Libr Assoc. 2022 Oct 1;110(4):463-470. doi: 10.5195/jmla.2022.1456.
8
The citation advantage of linking publications to research data.将出版物与研究数据关联的引文优势。
PLoS One. 2020 Apr 22;15(4):e0230416. doi: 10.1371/journal.pone.0230416. eCollection 2020.
9
URL decay in MEDLINE--a 4-year follow-up study.医学在线数据库(MEDLINE)中网址衰减的4年随访研究。
Bioinformatics. 2008 Jun 1;24(11):1381-5. doi: 10.1093/bioinformatics/btn127. Epub 2008 Apr 15.
10
Pilot study of linking Web-based supplemental interpretive information to laboratory test reports.
Am J Clin Pathol. 2009 Dec;132(6):818-23. doi: 10.1309/AJCPT7CHN8DLFVGU.

本文引用的文献

1
Long-term availability of data associated with articles in PLOS ONE.PLOS ONE 文章相关数据的长期可用性。
PLoS One. 2022 Aug 24;17(8):e0272845. doi: 10.1371/journal.pone.0272845. eCollection 2022.
2
Data sharing practices and data availability upon request differ across scientific disciplines.数据共享实践和根据请求提供数据的可用性因科学学科而异。
Sci Data. 2021 Jul 27;8(1):192. doi: 10.1038/s41597-021-00981-0.
3
A descriptive analysis of the data availability statements accompanying medRxiv preprints and a comparison with their published counterparts.
描述性分析 medRxiv 预印本附带的数据可用性声明,并与已发表的同行进行比较。
PLoS One. 2021 May 13;16(5):e0250887. doi: 10.1371/journal.pone.0250887. eCollection 2021.
4
Frequency of receiving requested data for a systematic review and associated factors: A cross-sectional study.
Account Res. 2022 Apr;29(3):165-177. doi: 10.1080/08989621.2021.1910029. Epub 2021 Apr 13.
5
The citation advantage of linking publications to research data.将出版物与研究数据关联的引文优势。
PLoS One. 2020 Apr 22;15(4):e0230416. doi: 10.1371/journal.pone.0230416. eCollection 2020.
6
Data sharing, management, use, and reuse: Practices and perceptions of scientists worldwide.数据共享、管理、使用和再利用:全球科学家的实践和看法。
PLoS One. 2020 Mar 11;15(3):e0229003. doi: 10.1371/journal.pone.0229003. eCollection 2020.
7
Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal .数据可用性、可重用性和分析可重复性:评估期刊强制开放数据政策的影响
R Soc Open Sci. 2018 Aug 15;5(8):180448. doi: 10.1098/rsos.180448. eCollection 2018 Aug.
8
Data sharing in PLOS ONE: An analysis of Data Availability Statements.PLOS ONE 数据共享:数据可获取性声明分析。
PLoS One. 2018 May 2;13(5):e0194768. doi: 10.1371/journal.pone.0194768. eCollection 2018.
9
Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content.学术背景漂泊不定:四分之三的统一资源标识符(URI)引用指向的内容已更改。
PLoS One. 2016 Dec 2;11(12):e0167475. doi: 10.1371/journal.pone.0167475. eCollection 2016.
10
The FAIR Guiding Principles for scientific data management and stewardship.科学数据管理和保存的 FAIR 指导原则。
Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18.