Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, California, United States of America.
Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, United States of America.
PLoS Comput Biol. 2021 Nov 11;17(11):e1009481. doi: 10.1371/journal.pcbi.1009481. eCollection 2021 Nov.
Functional, usable, and maintainable open-source software is increasingly essential to scientific research, but there is a large variation in formal training for software development and maintainability. Here, we propose 10 "rules" centered on 2 best practice components: clean code and testing. These 2 areas are relatively straightforward and provide substantial utility relative to the learning investment. Adopting clean code practices helps to standardize and organize software code in order to enhance readability and reduce cognitive load for both the initial developer and subsequent contributors; this allows developers to concentrate on core functionality and reduce errors. Clean coding styles make software code more amenable to testing, including unit tests that work best with modular and consistent software code. Unit tests interrogate specific and isolated coding behavior to reduce coding errors and ensure intended functionality, especially as code increases in complexity; unit tests also implicitly provide example usages of code. Other forms of testing are geared to discover erroneous behavior arising from unexpected inputs or emerging from the interaction of complex codebases. Although conforming to coding styles and designing tests can add time to the software development project in the short term, these foundational tools can help to improve the correctness, quality, usability, and maintainability of open-source scientific software code. They also advance the principal point of scientific research: producing accurate results in a reproducible way. In addition to suggesting several tips for getting started with clean code and testing practices, we recommend numerous tools for the popular open-source scientific software languages Python, R, and Julia.
功能齐全、易于使用且可维护的开源软件对于科学研究越来越重要,但软件开发和可维护性的正式培训差异很大。在这里,我们提出了 10 条“规则”,这些规则以两个最佳实践组件为中心:代码整洁和测试。这两个领域相对简单,与学习投资相比,提供了大量实用价值。采用整洁的代码实践有助于标准化和组织软件代码,以提高可读性并降低初始开发人员和后续贡献者的认知负担;这使开发人员能够专注于核心功能并减少错误。整洁的编码风格使软件代码更易于测试,包括最适合模块化和一致的软件代码的单元测试。单元测试询问特定的和孤立的编码行为,以减少编码错误并确保预期的功能,尤其是在代码变得更加复杂时;单元测试还隐式提供了代码的示例用法。其他形式的测试旨在发现由于意外输入或复杂代码库的交互而产生的错误行为。虽然在短期内遵守编码风格和设计测试会增加软件开发项目的时间,但这些基础工具可以帮助提高开源科学软件代码的正确性、质量、可用性和可维护性。它们还推进了科学研究的主要观点:以可重复的方式产生准确的结果。除了提出一些有关开始使用整洁的代码和测试实践的提示外,我们还为流行的开源科学软件语言 Python、R 和 Julia 推荐了许多工具。