Sharma Nitesh Kumar, Ayyala Ram, Deshpande Dhrithi, Patel Yesha M, Munteanu Viorel, Ciorba Dumitru, Fiscutean Andrada, Vahed Mohammad, Sarkar Aditya, Guo Ruiwei, Moore Andrew, Darci-Maher Nicholas, Nogoy Nicole A, Abedalthagafi Malak S, Mangul Serghei
Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, 1540 Alcazar Street, Los Angeles, CA 90033, USA.
Quantitative and Computational Biology Department, USC Dana and David Dornsife College of Letters, Arts, and Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, USA.
bioRxiv. 2023 Aug 7:2023.07.31.551384. doi: 10.1101/2023.07.31.551384.
Data-driven computational analysis is becoming increasingly important in biomedical research, as the amount of data being generated continues to grow. However, the lack of practices of sharing research outputs, such as data, source code and methods, affects transparency and reproducibility of studies, which are critical to the advancement of science. Many published studies are not reproducible due to insufficient documentation, code, and data being shared. We conducted a comprehensive analysis of 453 manuscripts published between 2016-2021 and found that 50.1% of them fail to share the analytical code. Even among those that did disclose their code, a vast majority failed to offer additional research outputs, such as data. Furthermore, only one in ten papers organized their code in a structured and reproducible manner. We discovered a significant association between the presence of code availability statements and increased code availability (p=2.71×10). Additionally, a greater proportion of studies conducting secondary analyses were inclined to share their code compared to those conducting primary analyses (p=1.15*10). In light of our findings, we propose raising awareness of code sharing practices and taking immediate steps to enhance code availability to improve reproducibility in biomedical research. By increasing transparency and reproducibility, we can promote scientific rigor, encourage collaboration, and accelerate scientific discoveries. We must prioritize open science practices, including sharing code, data, and other research products, to ensure that biomedical research can be replicated and built upon by others in the scientific community.
随着所产生的数据量持续增长,数据驱动的计算分析在生物医学研究中变得越来越重要。然而,缺乏共享研究成果(如数据、源代码和方法)的做法,影响了研究的透明度和可重复性,而这对科学进步至关重要。由于共享的文档、代码和数据不足,许多已发表的研究无法复现。我们对2016年至2021年间发表的453篇手稿进行了全面分析,发现其中50.1%未共享分析代码。即使在那些披露了代码的研究中,绝大多数也未能提供数据等其他研究成果。此外,只有十分之一的论文以结构化和可复现的方式组织其代码。我们发现代码可用性声明的存在与代码可用性的提高之间存在显著关联(p = 2.71×10)。此外,与进行初步分析的研究相比,进行二次分析的研究中有更大比例倾向于共享其代码(p = 1.15*10)。鉴于我们的发现,我们建议提高对代码共享实践的认识,并立即采取措施提高代码可用性,以改善生物医学研究中的可重复性。通过提高透明度和可重复性,我们可以促进科学严谨性,鼓励合作,并加速科学发现。我们必须优先考虑开放科学实践,包括共享代码、数据和其他研究产品,以确保生物医学研究能够被科学界的其他人复现并在此基础上进行拓展。