Ducut Erick, Liu Fang, Fontelo Paul
Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda MD, USA.
BMC Med Inform Decis Mak. 2008 Jun 11;8:23. doi: 10.1186/1472-6947-8-23.
For years, Uniform Resource Locator (URL) decay or "link rot" has been a growing concern in the field of biomedical sciences. This paper addresses this issue by examining the status of the URLs published in MEDLINE abstracts, establishing current availability and estimating URL decay in these records from 1994 to 2006. We also reviewed the information provided by the URL to determine if the context that the author cited in writing the paper is the same information presently available in the URL. Lastly, with all the documented recommended methods to preserve URL links, we determined which among them have gained acceptance among authors and publishers.
MEDLINE records from 1994 to 2006 from the National Library of Medicine in Extensible Mark-up Language (XML) format were processed yielding 10,208 URL addresses. These were accessed once daily at random times for 30 days. Titles and abstracts were also searched for the presence of archival tools such as WebCite, Persistent URL (PURL) and Digital Object Identifier (DOI).
Results showed that the average URL length ranged from 13 to 425 characters with a mean length of 35 characters [Standard Deviation (SD) = 13.51; 95% confidence interval (CI) 13.25 to 13.77]. The most common top-level domains were ".org" and ".edu", each with 34%. About 81% of the URL pool was available 90% to 100% of the time, but only 78% of these contained the actual information mentioned in the MEDLINE record. "Dead" URLs constituted 16% of the total. Finally, a survey of archival tool usage showed that since its introduction in 1998, only 519 of all abstracts reviewed had incorporated DOI addresses in their MEDLINE abstracts.
URL persistence parallels previous studies which showed approximately 81% general availability during the 1-month study period. As peer-reviewed literature remains to be the main source of information in biomedicine, we need to ensure the accuracy and preservation of these links.
多年来,统一资源定位符(URL)失效或“链接腐烂”一直是生物医学领域日益关注的问题。本文通过检查MEDLINE摘要中发布的URL的状态,确定当前的可用性,并估计1994年至2006年这些记录中的URL失效情况,来解决这一问题。我们还审查了URL提供的信息,以确定作者撰写论文时引用的上下文信息是否与当前URL中提供的信息相同。最后,根据所有记录在案的推荐的URL链接保存方法,我们确定了其中哪些方法已获得作者和出版商的认可。
处理了美国国立医学图书馆1994年至2006年以可扩展标记语言(XML)格式提供的MEDLINE记录,得到10208个URL地址。在30天内每天随机时间访问一次这些地址。还在标题和摘要中搜索了诸如WebCite、永久URL(PURL)和数字对象标识符(DOI)等存档工具。
结果显示,URL的平均长度在13到425个字符之间,平均长度为35个字符[标准差(SD)=13.51;95%置信区间(CI)13.25至13.77]。最常见的顶级域名是“.org”和“.edu”,各占34%。约81%的URL库在90%至100%的时间内可用,但其中只有78%包含MEDLINE记录中提到的实际信息。“死”URL占总数的16%。最后,一项关于存档工具使用情况的调查显示,自1998年引入以来,在所有审查的摘要中,只有519篇在其MEDLINE摘要中纳入了DOI地址。
URL的持久性与之前的研究结果相似,即在1个月的研究期内,总体可用性约为81%。由于同行评审文献仍然是生物医学信息的主要来源,我们需要确保这些链接的准确性和保存。