Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, United States of America.
Dept of Biology, New York University, New York, NY, United States of America.
PLoS Comput Biol. 2020 May 4;16(5):e1007507. doi: 10.1371/journal.pcbi.1007507. eCollection 2020 May.
Many scientific disciplines rely on computational methods for data analysis, model generation, and prediction. Implementing these methods is often accomplished by researchers with domain expertise but without formal training in software engineering or computer science. This arrangement has led to underappreciation of sustainability and maintainability of scientific software tools developed in academic environments. Some software tools have avoided this fate, including the scientific library Rosetta. We use this software and its community as a case study to show how modern software development can be accomplished successfully, irrespective of subject area. Rosetta is one of the largest software suites for macromolecular modeling, with 3.1 million lines of code and many state-of-the-art applications. Since the mid 1990s, the software has been developed collaboratively by the RosettaCommons, a community of academics from over 60 institutions worldwide with diverse backgrounds including chemistry, biology, physiology, physics, engineering, mathematics, and computer science. Developing this software suite has provided us with more than two decades of experience in how to effectively develop advanced scientific software in a global community with hundreds of contributors. Here we illustrate the functioning of this development community by addressing technical aspects (like version control, testing, and maintenance), community-building strategies, diversity efforts, software dissemination, and user support. We demonstrate how modern computational research can thrive in a distributed collaborative community. The practices described here are independent of subject area and can be readily adopted by other software development communities.
许多科学学科都依赖于计算方法来进行数据分析、模型生成和预测。这些方法的实现通常由具有领域专业知识但没有软件工程或计算机科学正式培训的研究人员来完成。这种安排导致人们对学术环境中开发的科学软件工具的可持续性和可维护性认识不足。一些软件工具避免了这种命运,包括科学库 Rosetta。我们使用这个软件及其社区作为案例研究,展示了即使在不同的学科领域,现代软件开发也可以成功完成。 Rosetta 是用于大分子建模的最大软件套件之一,拥有 310 万行代码和许多最先进的应用程序。自 20 世纪 90 年代中期以来,该软件一直由 RosettaCommons 共同开发,这是一个由来自全球 60 多个机构的学者组成的社区,他们的背景包括化学、生物学、生理学、物理学、工程学、数学和计算机科学。开发这个软件套件为我们提供了二十多年的经验,了解如何在拥有数百名贡献者的全球社区中有效地开发先进的科学软件。在这里,我们通过解决技术方面(如版本控制、测试和维护)、社区建设策略、多样性工作、软件传播和用户支持来展示这个开发社区的运作方式。我们展示了现代计算研究如何在分布式协作社区中蓬勃发展。这里描述的实践独立于学科领域,可以被其他软件开发社区轻易采用。