Wren Jonathan D, Georgescu Constantin, Giles Cory B, Hennessey Jason
Oklahoma Medical Research Foundation, Oklahoma City, Arthritis and Clinical Immunology Research Program, 825 N.E. 13th Street, Oklahoma City, OK 73104-5005, USA.
University of Oklahoma Health Sciences Center, Department of Biochemistry and Molecular Biology, 940 Stanton L. Young Blvd, OK 73104-5005, USA.
Nucleic Acids Res. 2017 Apr 20;45(7):3627-3633. doi: 10.1093/nar/gkx182.
Scientific Data Analysis Resources (SDARs) such as bioinformatics programs, web servers and databases are integral to modern science, but previous studies have shown that the Uniform Resource Locators (URLs) linking to them decay in a time-dependent manner, with ∼27% decayed to date. Because SDARs are overrepresented among science's most cited papers over the past 20 years, loss of widely used SDARs could be particularly disruptive to scientific research. We identified URLs in MEDLINE abstracts and used crowdsourcing to identify which reported the creation of SDARs. We used the Internet Archive's Wayback Machine to approximate 'death dates' and calculate citations/year over each SDAR's lifespan. At first glance, decayed SDARs did not significantly differ from available SDARs in their average citations per year over their lifespan or journal impact factor (JIF). But the most cited SDARs were 94% likely to be relocated to another URL versus only 34% of uncited ones. Taking relocation into account, we find that citations are the strongest predictors of current online availability after time since publication, and JIF modestly predictive. This suggests that URL decay is a general, persistent phenomenon affecting all URLs, but the most useful/recognized SDARs are more likely to persist.
生物信息学程序、网络服务器和数据库等科学数据分析资源(SDARs)是现代科学不可或缺的一部分,但先前的研究表明,指向这些资源的统一资源定位符(URLs)会随时间衰减,至今约有27%已经失效。由于在过去20年里,科学领域被引用次数最多的论文中,SDARs的占比过高,广泛使用的SDARs的丢失可能会对科学研究造成特别大的干扰。我们在MEDLINE摘要中识别出URLs,并通过众包来确定哪些报道了SDARs的创建。我们使用互联网档案馆的时光机来估算“死亡日期”,并计算每个SDAR生命周期内的年引用量。乍一看,失效的SDARs在其生命周期内的年均引用量或期刊影响因子(JIF)方面与可用的SDARs没有显著差异。但被引用次数最多的SDARs有94%可能会被重新定位到另一个URL,而未被引用的SDARs只有34%会这样。考虑到重新定位因素,我们发现,自发表以来的时间之后,引用量是当前在线可用性的最强预测指标,而JIF的预测作用较小。这表明URL衰减是一个普遍且持续的现象,影响着所有URLs,但最有用/最受认可的SDARs更有可能持续存在。